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ETAPS Foreword 


Welcome to the 23rd ETAPS! This is the first time that ETAPS took place in Ireland in 
its beautiful capital Dublin. 

ETAPS 2020 was the 23rd instance of the European Joint Conferences on Theory 
and Practice of Software. ETAPS is an annual federated conference established in 
1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each 
conference has its own Program Committee (PC) and its own Steering Committee 
(SC). The conferences cover various aspects of software systems, ranging from 
theoretical computer science to foundations of programming language developments, 
analysis tools, and formal approaches to software engineering. Organizing these 
conferences in a coherent, highly synchronized conference program enables researchers 
to participate in an exciting event, having the possibility to meet many colleagues 
working in different directions in the field, and to easily attend talks of different 
conferences. On the weekend before the main conference, numerous satellite 
workshops took place that attracted many researchers from all over the globe. Also, for 
the second time, an ETAPS Mentoring Workshop was organized. This workshop is 
intended to help students early in the program with advice on research, career, and life 
in the fields of computing that are covered by the ETAPS conference. 

ETAPS 2020 received 424 submissions in total, 129 of which were accepted, 
yielding an overall acceptance rate of 30.4%. I thank all the authors for their interest in 
ETAPS, all the reviewers for their reviewing efforts, the PC members for their 
contributions, and in particular the PC (co-)chairs for their hard work in running this 
entire intensive process. Last but not least, my congratulations to all authors of the 
accepted papers! 

ETAPS 2020 featured the unifying invited speakers Scott Smolka (Stony Brook 
University) and Jane Hillston (University of Edinburgh) and the conference-specific 
invited speakers (ESOP) Isil Dillig (University of Texas at Austin) and (FASE) Willem 
Visser (Stellenbosch University). Invited tutorials were provided by Erika Abraham 
(RWTH Aachen University) on the analysis of hybrid systems and Madhusudan 
Parthasarathy (University of Illinois at Urbana-Champaign) on combining Machine 
Learning and Formal Methods. On behalf of the ETAPS 2020 attendants, I thank all the 
speakers for their inspiring and interesting talks! 

ETAPS 2020 took place in Dublin, Ireland, and was organized by the University of 
Limerick and Lero. ETAPS 2020 is further supported by the following associations and 
societies: ETAPS e.V., EATCS (European Association for Theoretical Computer 
Science), EAPLS (European Association for Programming Languages and Systems), 
and EASST (European Association of Software Science and Technology). The local 
organization team consisted of Tiziana Margaria (general chair, UL and Lero), 
Vasileios Koutavas (Lero@UCD), Anila Mjeda (Lero@UL), Anthony Ventresque 
(Lero@ UCD), and Petros Stratis (Easy Conferences). 
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The ETAPS Steering Committee (SC) consists of an Executive Board, and 
representatives of the individual ETAPS conferences, as well as representatives of 
EATCS, EAPLS, and EASST. The Executive Board consists of Holger Hermanns 
(Saarbrücken), Marieke Huisman (chair, Twente), Joost-Pieter Katoen (Aachen and 
Twente), Jan Kofron (Prague), Gerald Liittgen (Bamberg), Tarmo Uustalu (Reykjavik 
and Tallinn), Caterina Urban (Inria, Paris), and Lenore Zuck (Chicago). 

Other members of the SC are: Armin Biere (Linz), Jordi Cabot (Barcelona), Jean 
Goubault-Larrecq (Cachan), Jan-Friso Groote (Eindhoven), Esther Guerra (Madrid), 
Jurriaan Hage (Utrecht), Reiko Heckel (Leicester), Panagiotis Katsaros (Thessaloniki), 
Stefan Kiefer (Oxford), Barbara König (Duisburg), Fabrice Kordon (Paris), Jan 
Kretinsky (Munich), Kim G. Larsen (Aalborg), Tiziana Margaria (Limerick), Peter 
Müller (Zurich), Catuscia Palamidessi (Palaiseau), Dave Parker (Birmingham), 
Andrew M. Pitts (Cambridge), Peter Ryan (Luxembourg), Don Sannella (Edinburgh), 
Bernhard Steffen (Dortmund), Mariélle Stoelinga (Twente), Gabriele Taentzer 
(Marburg), Christine Tasson (Paris), Peter Thiemann (Freiburg), Jan Vitek (Prague), 
Heike Wehrheim (Paderborn), Anton Wijs (Eindhoven), and Nobuko Yoshida 
(London). 

I would like to take this opportunity to thank all speakers, attendants, organizers 
of the satellite workshops, and Springer for their support. I hope you all enjoyed 
ETAPS 2020. Finally, a big thanks to Tiziana and her local organization team for all 
their enormous efforts enabling a fantastic ETAPS in Dublin! 


February 2020 Marieke Huisman 
ETAPS SC Chair 
ETAPS e.V. President 


Preface 


This volume contains the papers presented at the 23rd International Conference on 
Fundamental Approaches to Software Engineering (FASE 2020) held during 
April 25-30, 2020, in Dublin, Ireland. FASE 2020 was organized as part of the annual 
European Joint Conferences on Theory and Practice of Software (ETAPS 2020). 

FASE is concerned with the foundations on which software engineering is built. The 
papers submitted covered topics such as requirements engineering, software architec- 
tures, specification, software quality, validation, verification of functional and 
non-functional properties, model-driven development and model transformation, soft- 
ware processes, security, and software evolution. In particular, the 2020 edition of 
FASE saw an increased number of papers with empirical studies. 

FASE 2020 had no separate abstract submission deadline and we received 81 
submissions on the paper deadline with 5 tool papers, 4 empirical evaluation papers and 
72 research papers. The submissions came from the following countries (in alphabetical 
order): Argentina, Australia, Austria, Belgium, Canada, China, Colombia, Denmark, 
Estonia, Finland, France, Germany, Greece, Hungary, India, Iran, Italy, Japan, 
Luxembourg, Macedonia, Netherlands, New Zealand, Norway, Portugal, Russia, 
Singapore, South Korea, Spain, Sweden, Switzerland, the UK, and the USA. Out 
of these submissions, we accepted 23 papers (28% acceptance rate) after the review and 
discussion phases with the Program Committee (PC) members plus 63 additional 
external reviewers. FASE again used a double-blind reviewing process. We thank the 
PC members and reviewers for doing an excellent job! 

This volume also contains an invited paper by our keynote speaker Willem Visser. It 
complements his talk on “The Magic of Analyzing Programs”. 

For the first time, FASE hosted the International Competition on Software Testing 
(Test-Comp 2020), chaired and organized by Dirk Beyer. Test-Comp 2020 is the 
second edition of an annual competition for testing tools providing a comparative 
evaluation of different tools. This edition contained 10 participating tools, from aca- 
demia and industry. These proceedings contain papers of six tools, having participated 
in the competition, as well as a summary by the competition organizer Dirk Beyer. The 
tool papers were reviewed and selected by a separate PC: the Test-Comp 2020 jury. 
Each Test-Comp paper was assessed by at least three reviewers. 

We thank the ETAPS 2020 organizers, in particular, Tiziana Margaria, the general 
chair, and Vasileios Koutavas, Anila Mjeda, Anthony Ventresque, and Petros Stratis. 
We also thank Marieke Huisman, the ETAPS Steering Committee (SC) chair, for 
managing the whole process, and Gabriele Taentzer, the FASE SC chair, for swift 
feedback on several questions. 

We hope that you will enjoy reading this volume. 
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Improving symbolic automata learning 
with concolic execution * 


Donato Clun!®, Phillip van Heerden?®, Antonio Filieri!®, and Willem 
Visser’ © 


1 Imperial College London 
2 Stellenbosch University 


Abstract. Inferring the input grammar accepted by a program is cen- 
tral for a variety of software engineering problems, including parsers 
verification, grammar-based fuzzing, communication protocol inference, 
and documentation. Sound and complete active learning techniques have 
been developed for several classes of languages and the corresponding au- 
tomaton representation, however there are outstanding challenges that 
are limiting their effective application to the inference of input grammars. 
We focus on active learning techniques based on L* and propose two ex- 
tensions of the Minimally Adequate Teacher framework that allow the 
efficient learning of the input language of a program in the form of sym- 
bolic automata, leveraging the additional information that can extracted 
from concolic execution. Upon these extensions we develop two learning 
algorithms that reduce significantly the number of queries required to 
converge to the correct hypothesis. 


1 Introduction 


Inferring the input grammar of a program from its implementation is central 
for a variety of software engineering activities, including automated documenta- 
tion, compiler analyses, and grammar-based fuzzing. 

Several learning algorithms have been investigated for inferring a grammar 
from examples of accepted and rejected input words, with active learning ap- 
proaches achieving the highest data-efficiency and strong convergence guaran- 
tees. Active learning is a theoretical framework enabling a learner to gather 
information about a target language by interacting with a teacher [1]. A mini- 
mally adequate teacher that can guarantee the convergence of an active language 
learning procedure for regular language is an oracle that can answer membership 
and equivalence queries. Membership queries check whether a word indicated by 
the learner is accepted by the target language and equivalence queries can con- 
firm that a hypothesis language proposed by the learner is equivalent to the 
target language, or provide a counterexample word otherwise. 


* This work has been partially supported by the EPSRC HiPEDS Centre for Doc- 
toral Training (EP/L016796/1), the DSI-NRF Centre of Excellence in Mathematical 
and Statistical Sciences (CoE-MaSS), and a Royal Society Newton Mobility Grant 
(NMG\R2 \170142). 


© The Author(s) 2020 
H. Wehrheim and J. Cabot (Eds.): FASE 2020, LNCS 12076, pp. 3-26, 2020. 
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However, when learning the input language accepted by a program from its 
code implementation, it is unrealistic to assume the availability of a complete 
equivalence oracle, because such an oracle would need to check the equivalence 
between the hypothesis language and arbitrary software code. 

In this paper, we explore the use of concolic execution to design active learn- 
ing procedures for inferring the input grammar of a program in the form of a 
symbolic finite automaton. In particular, we extend two state of the art active 
learning frameworks for symbolic learning by enabling the teacher to 1) provide 
more informative answers for membership queries by pairing the accept /reject 
outcome with a path condition describing all the input words that would result in 
the same execution as the word indicated by the learner, and 2) provide a partial 
equivalence oracle that may produce counterexamples for the learner hypothe- 
sis. The partial equivalence oracle would rely on the exploration capabilities of 
the concolic execution engine to identify input words for which the acceptance 
outcome differs between the target program and the learner’s hypothesis. To 
guarantee the termination of the concolic execution for equivalence queries, we 
set a bound on the length of the inputs the engine can generate during its ex- 
ploration. While necessarily incomplete, such equivalence oracle may effectively 
guide the learning process and guarantee the correctness of the learned language 
for inputs up to the set input bound. Finally, we propose a new class of symbolic 
membership queries that build on the constraint solving capabilities of the con- 
colic engine to directly infer complete information about the transitions between 
states of the hypothesis language. 

In our preliminary evaluation based on Java implementations of parsers for 
regular languages from the Automatark benchmark suite, the new active learning 
algorithms enabled by concolic execution learned the correct input language 
for 76% of the subject, despite the lack of a complete equivalence oracle and 
achieving a reduction of up to 96% of the number of membership and equivalence 
queries produced by the learner. 

The remaining of the paper is structured as follows: Section 2 introduces 
background concepts and definitions concerning symbolic finite state automata, 
active learning, and concolic execution. Section 3 describes in details the data 
structures and learning algorithms of two state of the art approaches — A* [11] 
and MAT* [3] — that will be the base for active learning strategies based on 
concolic execution formalized in Section 4. Section 5 will report on our prelim- 
inary experiments on the effectiveness and query-efficiency capabilities of the 
new strategies. Finally, Section 6 discusses related work and Section 7 presents 
our concluding remarks. 


2 Preliminaries 
2.1 Symbolic finite state automata 


Symbolic finite state automata (SFA) are an extension of finite state au- 
tomata where a transitions can be labeled with a predicate identifying a subset 
of the input alphabet [28]. The set of predicates allowed on SFA transitions 
should constitute an effective Boolean algebra [3], which guarantees closure with 
respect to boolean operations according to the following definition: 
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Definition 1. Effective Boolean algebra [3]. An effective Boolean algebra A is a 
tuple (D,Y, |-], L, T,V,^, =) where D is the set of domain elements; W is the set 
of predicates, including L and T; |] : ¥ — 2P is a denotation function such that 
[1] = 9, [T] =D, and for all ġ, 4% € ¥, [ovy] = leluly], [644] = Le] fy], 
and [-9] = D \ [¢]. 


Given an effective Boolean algebra A, an SFA is formally defined as: 


Definition 2. Symbolic Finite Automaton (SFA) [3]. A symbolic finite automa- 
ton M is a tuple (A, Q, qinit, F, A) where A is an effective Boolean algebra, called 
the alphabet; Q is a finite set of states; qinit € Q is the initial state; F C Q is 
the set of final states; and A C Q x Wy x Q is the transition relation consisting 
of a finite set of moves or transitions. 


Given a linearly ordered finite alphabet X, through the rest of the paper we 
will assume A to be the Boolean algebra over the union of intervals over X, with 
the canonical interpretations of union, intersection, and negation operators. With 
an abuse of notation, we will write Y% € A to refer to a predicate w in the set W of 
A. A word is a finite sequence of alphabet symbols (characters) w = woW1... Wn 
(wi € X), whose length len(w) = n — 1. We indicate with wl: i] the prefix of 
w up to the ¿į element excluded, and with wf[i :] the suffix of w starting from 
element i. We will use the notation w; and wļi] interchangeably. The language 
accepted by an SFA M will be indicated as Lm, or only L when the SFA M can 
be inferred by the context. For an SFA M and a word w, M(w) = true if M 
accepts w; false otherwise. 

Similarly to finite state automata, SFAs are closed under language inter- 
section, union, and complement, and admit a minimal form [3]. Compared to 
non-symbolic automata, SFAs can produce more compact representations over 
large alphabets (e.g., Unicode), allowing a single transition predicate to account 
for a possibly large set of characters, instead of explicitly enumerating all of 
them. 


2.2 Active learning and minimally adequate teachers 


Active learning encompasses a set of techniques enabling a learning algo- 
rithm to gather information interacting with a suitable oracle, referred to as 
teacher. Angluin [1] proposed an exact, active learning algorithm for a regular 
language L named L*. In L* the learner can ask the oracle two types of queries, 
namely membership and equivalence queries. In a membership query, the learner 
selects a word w € X* and the oracle answers whether the w € L (formally, the 
membership oracle is a function Om : X* — B, where B = {true, false}). In 
an equivalence query, the learner selects an hypothesis finite state automaton 
(FSA) H and asks the oracle whether Ly = L; if Ly Æ L, the oracle returns a 
countererample, i.e., a word w in which L differs from Ly (formally, the equiv- 
alence oracle is a function Oe : FSA + X* U {true}). A teacher providing both 
Om and Oe, and able to produce a counter example as result from Oe is called 
a minimally adequate teacher. Given a minimally adequate teacher, L* is guar- 
anteed to learn the target language L with a number of queries polynomial in 
the number of states of a minimal deterministic automaton accepting L and in 
the size of the largest counterexample returned by the teacher [1]. 
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Discovering FSA states. Consider an FSA M. Given two words u and v such 
that M(u) 4 M(v) (i-e., one accepted and one rejected), it can be concluded 
that u and v reach different states of M. Moreover, if u and v share a suffix 
s (ie, u = a.s and v = b.s with a,b,s E€ X* and the dot representing word 
concatenation), a and b necessarily reach two different states qa and q, of M. The 
suffix s is a discriminator suffix for the two states because its parsing starting 
from qa and q leads to difference acceptance outcomes. The words a and b are 
instead access words of qa and qp, respectively, because their parsing from the 
initial state reaches qa and qb. This observation can be generalized to a set of 
words by considering all the unordered pairs of words in the set. Because M is a 
finite state automaton, there can be only a finite number of discriminable words 
in &* and, correspondingly, a finite number of distinct access string identifying 
the automaton’s states. 

State reached parsing a word. For a word w, consider a known discriminator 
suffix s and access word a. If Om(w.s) Æ Om(a.s), the state reached parsing w 
cannot be the one identified by a. Throughout the learning process, it is possible 
that none of the already discovered access words identifies the state reached by 
w. In this case, w would be a suitable candidate for discovering a new FSA state 
as described in the previous paragraph. 

Discovering FSA transitions. For each access string a and symbol a € X, a 
transition should exist between the states reached parsing a and a.g, respectively. 
2.3 Concolic execution 


Concolic execution [14,27] combines concrete and symbolic execution of a 
program, allowing to extract for a given concrete input a set of constraints on 
the input space that uniquely characterize the corresponding execution path. To 
this end, the target program is instrumented to pair each program input with a 
symbolic input variable and to record along an execution path the constraints 
on the symbolic inputs induced by the encountered conditional branches. The 
conjunction of the constraints recorded during the execution of the instrumented 
program on a concrete input is called path condition and characterize the equiv- 
alence class of all the inputs that would follow the same execution path (in this 
paper, we focus on sequential program, whose execution is uniquely determined 
by the program inputs). 

Explored path conditions can be stored in a prefix tree (symbolic execution 
tree), which captures all the paths already covered by at least one executed in- 
put. A concolic engine can traverse the symbolic execution tree to find branches 
not yet explored. The path condition corresponding to the selected unexplored 
branch is then solved using a constraint solver (e.g., an SMT solver [26]) gen- 
erating a concrete input that will cover the branch. The traversal order used to 
find the next branch to be covered is referred to as search strategy of the concolic 
executor. 

2.4 From path conditions to SFA 


In this paper, we consider only terminating programs that can either accept 
or reject a finite input word w € X* (e.g., either parsing it correctly or throwing 
a parsing exception). Furthermore, we assume for a given input word w, the 
resulting path condition to be expressible using a subset of the string constraint 
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language defined in [5]. This allows the translation of the resulting path condition 
into a finite state automaton [5]. The adaptation of this translation procedure 
to produce SFAs is straightforward. In particular, we will focus on constraints 
F recursively defined as: 

FoOC|AF\|FAF|FVF 

C => EO E |len(w) O E | win] O o | wilen(w) — n] O o 

E—>n|n+n|n-n 

O-<|=|> 
with n € Z is an integer constant and o € X. Informally, the path condition 
corresponding to processing a symbolic input word w should be reducible to a 
combination of interval constraints on the linearly ordered alphabet X for each 
of the symbols wļi] composing the input. Despite its restriction, this constraint 
language is expressive enough to capture the path conditions obtained from the 
concolic execution of a variety of programs that accept regular languages (which 
will be described in the evaluation section). The extension to support the entire 
string constraint language proposed in [5] is left as future work. 


3 Active learning for SFA 


Several active learning algorithms have been defined for SFAs. In this section, 
we recall and formalize the core routines of two extensions of L* proposed in [11] 
and [3], named A* and M AT“, respectively. We will then extend and adapt these 
routines to improve their efficiency and resilience to incomplete oracles based on 
partial concolic execution. 
Running example. To demonstrate the functioning of the algorithms discussed 
in this section and their extensions later one, we introduce here as running 
example the SFA accepting the language corresponding to regular expression 
.*\wl7\w] \d[7\d] .*, where \w matches any letter, digit, or underscore (i.e., 
[a-zA-Z0-9_]), \d matches any digit, and .* matches any sequence of symbols. 
The regular expression is evaluated over the 16bit unicode symbols. The corre- 
sponding SFA is represented in Figure 1, where transitions are labeled by the 
union of disjoint intervals and each interval is represented as o; — oj, or ø if it 
is composed by a single element; intervals are separated by a semicolon. 


u0000-uffff 


u0000-/; :-uffff 


u0000-/; :-@; 
u0000-/; :-@; 0-9; A-Z; [-^; `; {-uffff 


EAs [uf az 


u0000-/; :-@; 
[-^5; `; {-uffff 


Fig. 1. SFA accepting the language for the running example. 


This example highlights the conciseness of symbolic automata. It was chosen 
because the benefits of the methodologies discussed in this paper increase as the 
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transitions are labeled with predicates representing larger set of characters, and 
the intervals used in this example are representative of commonly used ones. 


3.1 Learning using observation tables 


A* is an adaptation of L* for learning SFAs. In both algorithms, the learner 
stores and process the information gathered by the oracle in an observation table 
(we adapt here the notation defined in [11]}): 


Definition 3. Observation table [11]. An observation table T for an SFA M is 
a tuple (X, S, R,E, f) where X is a potentially infinite set called the alphabet; 
S,R,E C &”™ are finite subsets of words called, respectively, prefixes, boundary, 
and suffixes. f : (SUR) x E > {true, false} is a Boolean classification function 
such that for word w € (SUR) ande € E, f(w.e) = true iff M(w.e). Addition- 
ally, the following invariants hold: (i) SA R = f, (ii) SUR is prefiz-closed, and 
the empty word e € S, (iii) for all s € S, there exists a character o € X such 
that s.o € R, and (iv) e€ E. 


Figure 2a shows an example observation table (T) according to the notation 
n [11]. The rows are indexed by elements of S U R, with the elements of S 
reported above the horizontal line and those of R below it. The columns instead 
are indexed by elements of E. An element in s € S represent the access word to a 
state qs, i.e., the state that would be reached by parsing s from the initial state. 
Elements in the boundary set R provide information about the SFA transitions. 
The elements of e € E are discrimination suffixes in that, if there exist s;, sj € S 
and e € E such that f(s;.e) Æ f(s;.e), si and s; reach different states of M. The 
cell corresponding to a row index w € SU R and column index e € E contains 
the result of f(w.e), which, for compactness, is represented as + or — when the 
f evaluates to true or false, respectively. For an element w € S U R, we use 
row(w) to indicate the vector of +/— in the row of the table indexed by w. 

An observation table is: closed if for each r € R there exists s € S such that 
row(r) = row(s); reduced if for all s;,s; € S, si A sj > row(s;) # row(s;); 
consistent if for all w; w; E€ SUR and o € X, if wy.o,wj0 E€ SUR and 
row(w;) = row(w;) then row(w;.c) = row(w;.c); evidence-closed if for alle € E 
and s € S, s.e E€ SUR. An observation table is choesive if it is closed, reduced, 
consistent, and evidence-closed. Informally, closed means that every element of 
R corresponds to a state identified by an element of S; reduced, that every state 
is identified by a unique access string in S; consistent, that if two words w; and 
wj are equivalent according to f and FE, then also w;.o and wj. should be 
equivalent for any symbol o € X. 

Induced SFA. A cohesive observation table T induces an SFA that accepts 
or reject words consistently with its classification function f. Such induced SFA 
Mr = (A, Q, dinit, F, A), where A is assumed to be the effective Boolean algebra 
over the union of intervals of X, is constructed as follows. For each s € S a 
corresponding state qs € Q is defined, with the initial state qini being qe. The 
final states F are all the states qs such that f(s) = true. Since T is cohesive, a 
function g : SUR —> S can be defined such that g(w) = s iff row(w) = row(s). 
Given g, for w € X* and o € X, if wo E€ SUR then (qgw), 0, Ag(w.c)) € A. 
However, this intuitive construction of the transition relation A would result in 
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[u0000-uffff] 


(a) (b) 


5 
rte To 


Fig. 2. Example of a cohesive observation table and its induced automata. 


the construction of a FSA, where each transition is labeled with a single element 
o € X. To obtain an equivalent SFA, an additional step is required to learn the 
transition predicates of the SFA Mr. 

Transition predicates. Given a Boolean algebra A with domain D = YX, a 
partition function can be defined that generalizes the concrete evidence for a 
transition of the induced automaton into a predicate of A. Intuitively, the re- 
sulting predicate for a transition from state q; to state q; should evaluate pos- 
itively for all the elements oj € X that would label a transition from q; to q; 
according to the function g defined in the previous paragraph, and negatively 
for all the elements gg that would label a transition from q; to a state other than 
qj. Because the function g is by construction a partial function (defined only for 
words w.o € SU R), the partition function can arbitrarily assign the symbols ø 
not classified by g. This produces a natural generalization of the induced SFA 
from an observation table. 

In this paper, we assume A to be the Boolean algebra over the union of 
intervals over X, with X being a linearly ordered finite alphabet, such as the ascii 
or unicode symbols. For this algebra, a partition function can be trivially defined 
by constructing for each transition an interval union predicate characterizing all 
the concrete evidence symbols that would label the transition according to g. 
Then, for a given state, the symbols for which g is not defined can be arbitrarily 
added to any of the predicates labeling an outgoing transition. A more efficient 
definition of a partition function for this algebra is beyond the scope of this 
section. The interested reader is instead referred to [11]. 

The introduction of a partition function to abstract concrete transition sym- 
bols into predicates of a Boolean algebra is the key generalization of A* over L* 
that allow learning SFAs instead of FSA. Going back to the observation table 
in Figure 2a, the induced SFA is shown in Figure 2b. The observation table 
provides concrete evidence for labeling the transition from e to itself with the 
symbol A. The partition function generalized this concrete evidence into the 
predicate [u0000-uffff], which assigned all the elements of the unicode alphabet 
to the sole outgoing transition from qo. 

Learning algorithm. Initially, the learner assumes an observation table cor- 
responding to the empty language, with S = E = {e} and R = {co} for an 
arbitrary o € X, like the one in Figure 2a. The corresponding induced SFA 
Mr is the hypothesis the learner proposes to the equivalence oracle Oe. If the 
hypothesis does not correspond to the target language, the equivalence oracle 
returns a counterexample c € X*. There are two possible reasons for a coun- 
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terexample: either a new state should be added to the current hypothesis or one 
of the predicates in the hypothesis SFA needs refinement. Both cases will be 
handled updating the observation table to include new evidence from the coun- 
terexample c, with the partition function automatically refining the transition 
predicates according to the new evidence in the table. 

To update the observation table, first all the prefixes of c (including c itself) 
are added to R, except those already present in S. (We assume every time an 
element is added to R, the corresponding row is filled by issuing membership 
queries to determine the value of f(r.e), e € E, for each cell.) If for a word 
r € R there is no word s € S such that row(r) = row(s), the word r identifies a 
newly discovered state and it is therefore moved to S; a word r.o for an arbitrary 
o € X is then added to R to trigger the exploration of outgoing transitions from 
the newly discovered state. To ensure the updated observation table is evidence- 
closed, for all s € S and e € E s.e and all its prefixes are added to R, if not 
already present. Finally, the observation table should be made consistent. To 
this end, if there exist and element o € X such that w;,w,;,wi.o,wj.0 E SUR 
with row(w;) = row(w,;) but row(w;.c) # row(w;.c), then w; and w; should 
lead to different states. Since row(w;.c) # row(w;.c), there exist e € E such 
that f(wi.c.e) # f(w,.c.e). Therefore, a.e can discriminate between the states 
reached parsing w; and w; and as such a.e should be added to E. The observation 
table is now cohesive and its induced SFA can be checked against the equivalence 
oracle, repeating this procedure until no counterexample can be found. 
Running example. We demonstrate the first three iterations of the A* learning 
procedure invoked on the automaton in Figure 1. The initial table (Figure 2a) is 
cohesive, so an SFA is induced (Figure 2b) and an equivalence query is issued. 
The oracle returns the counter example A!0B. The counter example and its 
prefixes are added to the table (Figure 3a), and the table becomes open. The 
table is closed (Figure 3b), and becomes cohesive. An SFA is induced (Figure 
3.1), and the equivalence query returns the counter example B. The counter 
example is added to the evidence (Figure 3c), and the table becomes consistent 
but open. The table is closed (Figure 3d), and becomes cohesive. 


Tile Tale Tse B Tule B 

ele e|- el- - el- - 

Al- A!0B|+ A!OB|+ + A!0B|+ + 

A!0B|+ Al- © A- Alo|- + 

Alo] - Alo} - Alo] - + Al 

A! - Al] - A!l- - Al|- - 

Bl- - Bl- - 

(a) Add A!0B to ta- (b) Close. (c) Add B to table (d) Close. 
ble. and evidence. 


Fig. 3. Observation tables for two iterations of A*. 
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[u0000-u0041] [u0000-uffff] 


[u0042-uffff] 
start qdo 


(a) SFA for Table 3b 


[u0000-u0029] U [u0041-uffff] [u0000-u0041] [u0000-uffff] 


[u0030-u0040] A, [u0042-uffff] 
start qo Fy (2) 


(b) SFA for Table 3d 


Fig. 4. Hypothesis automata for the learning iterations in Figure 3. 


3.2 Learning using discrimination trees 


A discrimination tree (DT) is a binary classification tree used by the learner 
to store the information gathered from the teacher. Introduced in [23], it is 
the core data structure of several learning algorithms, including TTT [20] and 
M AT* [3]. We formalize its structure and main routines that will be the baseline 
for extensions presented in the next section. 

Recalling from Section 2.2, each state qa of an SFA M is identified by the 
learner using a unique access word a € X*. Given two states qa and q, s E X* 
is a discriminator suffix for qa and q if parsing s starting from the two states 
leads to different outcomes (accept or reject). In terms of the state access words, 
this is equivalent to stating M (a.s) Æ M(b.s). A discrimination tree stores the 
access words and discriminator suffixes learned for an SFA as per the following 
definition: 


Definition 4. Discrimination tree (adapted from [3]). A discrimination tree T 
is a tuple (N,L,T) where N is a set of nodes, L C N is a set of leaves, and 
T C N xN xB is the transitions relation. Each leaf l € L is associated with 
a corresponding access word (aw(l)). Each internal node i € N\L is associated 
with a discriminator suffix d(i). For each element (p,n,b) € T, p is the parent 
node of n and if b = true (respectively b = false) we say that n is the accept 
(respectively, reject) child of p. 


For a leaf l € L and inner node n € N\L, if l is in the subtree of n rooted 
in its accept child, then M(aw(l).d(n)) = true. Similarly, if l is in the reject 
subtree of n, M(aw(l).d(n)) = false. In other words, the concatenation of aw(l) 
with the discriminator suffix of any of its ancestor nodes is accepted iff l is 
in the accept subtree of the ancestor node. For any two leaves l;,lj € L let 
Ni j be their lowest common ancestor in the DT. Then the discriminator suffix 
d(n;,;) allows to discriminate the two states corresponding to l; and 1; since 
M(aw(l;).d(nij)) # M(aw(l;).d(ni,;)), with aw(l;).d(ni,;) being the accepted 
word if l; is in the accept subtree of ni j, or the rejected word otherwise. 
Learning algorithm. We will here refer to the functioning of MAT™ [3], al- 
though the main concepts apply to DT-based learning in general. To initialize 
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the DT, the learner performs a membership query on the empty string e. The 
initial discrimination tree will be composed of two nodes: the root and a leaf 
node, both labeled with e. Depending on the outcome of the membership query, 
the leaf will be either the accept or the reject child of the root. 


Given a word w € X*, to identify the state reached by parsing it accord- 
ing to the DT, the learner performs an operation called sift. Sift traverses the 
tree starting from its root r. For each internal node n it visits, it executes the 
membership query O,,(w.d(n)) to check whether w concatenated with the dis- 
criminator suffix of d is accepted by the target language. If it is accepted, sift 
continues visiting the accept child of n, and the reject child otherwise. If a leaf is 
reached, the learner concludes that parsing w the target SFA reaches the state 
identified by the leaf’s access word. If instead the child node sift should tra- 
verse next does not exist, a new leaf is created in its place with access word w. 
Membership queries of the form a.g, where a is an access string in the DT and 
o € X are then issued to discover transitions of the SFA, possibly leading to the 
discovery of new states. 


Induced SFA. A discrimination tree DT induces an SFA Mpr = (A, Q, linit, 
F, A). In this paper, we assume A to be the Boolean algebra over the union of 
disjoint intervals over X. Q is populated with one state q for each leaf l € L of 
DT. The state qe is the initial state. If O,,(aw(l)) = true, then qı € F is a final 
state of Mpr. To construct the transition relation A, sifts of the form aw(l).o 
for o € X are issued for the states qı and the concrete evidence for a transition 
between two states q; and q; is summarized into a consistent predicate of A 
using a partition function, as described for A*. 


Counterexamples. The equivalence query O.(M pr) will either confirm the 
learner identified the target language or produce a counterexample c € X*. As 
for A*, the existence of c implies that either a transition predicate is incorrect 
or that there should be a new state. To determine the cause of c, the first step 
is to identify the longest prefix c|: i] before the behavior of the hypothesis SFA 
diverged from the target language. To localize the divergence point, the learner 
analyzes the prefixes c[: i] for i € [0,len(c)]. Let a; be the access string of the 
state of M pr reached parsing c|: i]. If Om (a;.wlt :]) Æ Om(c), i is the divergence 
point, which implies that the transition taken from qq, is incorrect. Let qj be 
the state corresponding to the leaf reached when sifting a;.c[t :]. The predicate 
guarding the transition between qa; and q; is incorrect if c[i] does not satisfy 
the corresponding transition predicate. This is possible because the partition 
function assigns the symbols in X for which no concrete evidence is available 
to any of the outgoing transitions of qa;. In this case, the transition predicates 
should be recomputed to account for the new evidence from c. If instead cfi] 
satisfies the transition predicate between qa; and qj, a new state should be added 
such that parsing cļi] from qa, reaches it. To add the new state, the leaf labeled 
with a; is replaced by a subtree composed of three nodes: an internal node with 
discriminator suffix c[i :] having as children the leaf a; and a new leaf labeled 
by the access string j.c[2], where j is the access string of the state q; obtainened 
by sifting a;.cļi :]. This procedure is called split (for more details, see, e.g., [3]). 
The updated DT will then be the base for the next learning iteration. 
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Running example. A DT corresponding to the running example introduced 
in Section 3 is shown below. While the specific structure of the learned DT 
depends on the order in which words are added to it, all the DT resulting from 
the learning process induce the same classification of the words w € X*, being 
them consistent representations of the same target language. 


Discriminator: € 


i= ys Reject 
`a 


State 4 


Discriminator: "^4_" 


4 
Te 3 „Reject 
N 
4 


Access string: "u*4%" 


Discriminator: "_" Discriminator: "4_" 
T 
N 

1 : ~ p 

ae Reject fos ` Reject 
S 
y a 
State 3 State 1 State 2 State 0 
Access string: "u^4" Access string: "u" Access string: "u^" Access string: € 


Fig. 5. Discrimination tree learned for the example of Section 3. 


4 Active learning with concolic execution 


The state-of-the-art active learning algorithms formalized in the previous 
sections are of limited use when trying to infer (an approximation of) the input 
language accepted by a program. Their main limitation is the reliance on a 
complete equivalence oracle, which is unavailable in this case. 

In this section, we will propose several extensions of the A* and M AT* algo- 
rithms formalized in Sections 3.1 and 3.2 that make use of a concolic execution 
engine to 1) gather enhanced information from membership queries thanks to 
the path condition computed by the concolic engine, and 2) mitigate the lack of 
an equivalence oracle using the concolic engine to find counterexamples for a hy- 
pothesis. While it is usually unrealistic to assume a complete concolic execution 
of a large program (which would per se be sufficient to characterize the accepted 
input language), the ability of the concolic engine to execute each execution path 
only once brings significant benefits in our preliminary evaluation. Additionally, 
because the concolic engine can ask a constraints solver to produce inputs with 
a bounded length, it can be used to prove bounded equivalence between the 
learned input SFA and the target language. Finally, the availability of a partial 
symbolic execution tree and a constraint solver enables the definition of more 
effective types of membership queries. 


4.1 Concolic learning with symbolic observation tables 


Given a program P its concolic execution on a word w € X* produces a 
boolean outcome (accept/reject) and a path condition capturing the properties 


14 D. Clun et al. 


of w that led to that outcome. In particular, we assume the path condition to be 
reducible to the constraint language defined in Section 2.4, i.e., the conjunction 
of interval predicates on the elements w; of w and its length len(w). Under this 
assumption, the path condition is directly translatable to a word wy over the 
predicates W of the Boolean algebra A over the union of intervals over X. We 
will therefore refer to the path condition produced by the concolic execution of 
a word w € X* with the Y-predicate as wy, where len(w) = len(wy). 


Symbolic observation table. The surjective relation between concrete words 
w and their predicates wy enables a straightforward extension of the observation 
table used for A*, where the rows of the table can be indexed by words wy € W* 
instead of concrete words from »”*, i.e., SUR C W*. This allows for each row 
index to account for the entire equivalence class of words w € X* that would 
follow the same execution path (these words will also have the same length). We 
describe as [wy] a concrete representative of the class wy. The set of suffixes 
E C X* will instead contain concrete elements of the alphabet. 


Membership queries. Executing a membership query of the form O,,([ww].c), 
with o € X, will produce both the boolean outcome (accept /reject) and a word 
over W* that can be added to R, if not already present. As a result, the transition 
predicates of the induced SFA can be obtained directly from the symbolic obser- 
vation table, avoiding the need for a partition function to synthesize W-predicates 
from the collected concrete evidence, as required in A*. The transition relation 
is then completed by redirecting every o € X that does not satisfy any of the 
discovered transition predicates to an artificial sink state. The induced SFA is 
then used as hypothesis for the next equivalence query. 


Equivalence queries. Because a complete equivalence oracle for the target 
language is not available, we will use concolic execution to obtain a bounded 
equivalence oracle comparing the hypothesis SFA induced by the symbolic ob- 
servation table with the program under analysis. To this end, we translate the 
hypothesis SFA into a function in the same programming language of the tar- 
get program P that takes as argument a word w and returns true (respectively, 
false) if the hypothesis SFA accepts (respectively, rejects) the word. We assume 
P to be wrapped into an analogous boolean function. We then write a program 
asserting that the result of the two functions is equal and use the concolic en- 
gine to find an input word that violate the assertion. If such word can be found, 
the counterexample is added to the symbolic observation table and the learner 
starts another iteration. If the concolic execution terminates without finding 
any assertion violation, it can be concluded that the hypothesis SFA represent 
the input language of P. However, it is usually unrealistic to assume the termi- 
nation of the concolic execution. Instead, we configure the solver to search for 
counterexamples up to a fixed length n. Assuming this input bounded concolic 
execution terminates without finding a counterexample, it can be concluded that 
the hypothesis is equivalent to P’s input language for every word up to length 
n. Notably, this implies that if the target language is actually regular and the 
corresponding minimal automata has at most n states, the hypothesis learned 
the entire language. 
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Running example. A symbolic observation table inducing the SFA for the 
example introduced in Section 3 is shown in Figure 6. The use of W predicates 
to index its rows significantly reduces the size of the table, since each row index 
accounts for a possibly large number of concrete elements of X. 
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Fig. 6. Symbolic observation table for the example of Section 3. 


4.2 Concolic learning with a symbolic membership oracle 


In the previous section, we used the concolic engine to extract the path con- 
ditions corresponding to the execution of membership queries produced by the 
learner. This enabled reducing the number of queries — each query gathering 
information about a set of words instead of a single one — and keeping the obser- 
vation table more compact. In this section, we introduce an oracle that answers 
a new class of symbolic membership queries (SMQs) using the constraint solving 
capabilities of the concolic engine to directly compute predicates characterizing 
all the accepted words of the form p.o.s, where p,s E€ X* and ø € X. This ora- 
cle will enable a more efficient learning algorithm based on an extension of the 
discrimination tree data structure. 


Definition 5. Symbolic Membership Oracle (Os). Given a Boolean algebra A 
with predicate set Y, a symbolic membership oracle Os : X* x X* — W takes as 
input a pair (p,s) and returns a predicate Yy E€ Y such that for a symbol o € X, 
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the target program accepts p.c.s iff a = y. p and s are called prefix and suffix, 
respectively. 


An SMQ query can be solved by issuing a membership query for each o € X. 
However, this operation would be costly for large alphabets, such as unicode. 
On the other hand, the concolic execution of w = p.o.s for a concrete symbol 
o returns via the path condition the entire set of symbols that wold follow 
the same execution path, in turn leading to the same execution outcome. A 
constraint solver can then be used to generate a new concrete input outside of 
such set, which is guaranteed to cover a new execution path. This procedure 
is summarized in Algorithm 1, where we use pathCondition|o] to represent the 
projection of the path condition on the element of the input string w = p.o.s 
corresponding to the position of ø. 


Input: SMQ Q = (p,w, s); concolic : X* — (accepted, pathCondition) 
Result: ~ such that Vo € X : p.c.s is accepted iff o = y 


wet; 
unknown + X; 
while unknown 4 Ú do 

o + pickElementFrom(unknown); 

accepted, pathCondition < concolic(p.a.s); 

if accepted then 

| Y4 y V pathCondition|o]; 

end 

unknown + unknown ^A 7 pathCondition|o]; 
end 
return Y; 

Algorithm 1: Answering SMQ queries. 


Learning transition predicates with O,. Consider the learning algorithm 
using discrimination tree introduced in Section 3.2, M AT*. After each iteration, 
the discrimination tree DT contains in its leaves all the discovered states (iden- 
tified by the respective access words) and organized according to their discrimi- 
nation suffixes (labeling the internal nodes of DT). To construct the transition 
relation of the induced SFA, the algorithm executes for each leaf l anda € X a 
sift operation to determine the state reached when parsing aw(l).o. Each such 
sift operation requires as many membership query as the depth of the reached 
state to be determined. Therefore, the number of sift operations needed to con- 
struct the complete transition relation is proportional to the number of states 
times the size of the input alphabet, with each sift operation issuing a number 
of membership queries proportional to the depth of DT. 

Using the symbolic membership oracle, we can instead define a procedure that 
traversing DT directly synthesize the transition predicate between a source state 
qs and a destination state q of the induced SFA. This procedure is formalized 
in Algorithm 2. 
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Input: DT = (N,L,T); Os : X* — y; source state qs; target state qt 
Result: The transition predicate m between qs and qt 
n < root of DT; 
mel; 
while n € N\L do 
p — Os(aw(qs), d(n)); 
if q in the accept subtree of n then 
TH TAY; 
| n + acceptChild(n); 
else 
T 4} TAn; 
| n + rejectChild(n); 


end 
return 7; 
Algorithm 2: Learning transition predicates with O,. 


Algorithm 2 allows to construct the induced SFA by computing for each 

ordered pair of leaves of DT the transition predicate of the corresponding tran- 
sition. This results in the direct construction of the complete transition relation 
of the induced SFA. In practice, the implementation of Algorithm 2 can be 
improved by observing that m; computed in the i-th iteration of the loop is by 
construction a subset of 7;_1. The symbolic membership oracle O, can make use 
of this observation to limit the search procedure for the construction of the pred- 
icate psi during the i-th iteration to only symbols that satisfy 7;_1, significantly 
improving its efficiency. Finally, for the same reason, the loop in Algorithm 2 can 
terminate as soon as 7 = L, which indicates that no transition exists between 
the source and destination states. 
Example. Referring to the discrimination tree in Figure 5 for the example in- 
troduced in Section 3, assume we want to learn transition predicate from State 
2 to State 3. Initially, ro = T. The access string of State 2 is “u^”. The suf- 
fix of the root node is e. Invoking the symbolic membership oracle, we obtain 
w = O,(“u’”,e) = L (no string of length 3 are accepted by the target language). 
Because State 3 is in the reject subtree, mı = mo A œL = T and the execution 
moves to the internal node labeled with the discriminator suffix ““4_”. The cor- 
responding SMQ query returns Y = O,(“u~”,“*4_”) = {0...9,A...Z,_,a...2}. 
Because State 3 is in the accept subtree of the current node, mz = ™ Aw = 
{0...9,A...Z,-,a...z} and the execution moves to the internal node with dis- 
criminator prefix “_”, where piz = [0...9] is finally computed as the transition 
predicate from State 2 to State 3. 


Decorated discrimination tree. For every leaf l and internal node n of a 
discrimination tree DT, Algorithm 2 issues a SMQ query (aw/(l),d(n)). The 
corresponding intermediate value of the transition predicate m is intersected 
with the result of the SMQ query or its negation depending on whether / is in 
the accept or the reject subtree of n. Notably, the addition of a newly discovered 
state to DT does not change the relative positioning of a leaf l with respect to an 
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internal node n, i.e., if l is initially in the accept (respectively, reject) subtree of 
n, it will remain in that subtree also after a new state is added. This observation 
implies that the results of the SMQ queries performed through Algorithm 2 
remain valid between different executions of the algorithm. Therefore, when 
a new state is discovered and added to the discrimination tree via the split 
operation defined in Section 3.2, only the membership queries involving the new 
internal node and the new leaf added by split would require an actual execution 
of the symbolic membership oracle. 


To enable the reuse of previous SMQ queries issued through Algorithm 2, 
we decorate the DT adding to each node a map from the set of leaves L to 
the value of m computed when traversing the node. We refer to this map as 
predicate map. Every predicate in the root node map is T, as this is the initial 
value of m in Algorithm 2. The maps in the children nodes are then computed 
as follows. Let n be a parent node and ng, Ny its accept and reject children 
respectively, m be a leaf of DT, and m7", Tp, Tp. the predicates for m stored 
in n, Na, and nr, respectively. Then m7 = Os(aw(m),d(n)) A Tp and tm, = 
=0,(aw(m), d(n)) A 7. Proceeding recursively a leaf 1 will be decorated with 
a predicate map assigning to each leaf l; in DT the predicate of the transition 
going from qu; to qı. 


Figure 7 shows the decorated version of the discrimination tree of Figure 5, 
constructed for the same example language introduced in Section 3. 


Discriminator: € 
0 | u0000-ufffF 
1 | u0000-uffff 
2 | u0000-ufftf 
3 | u0000-uffff 
4 | u0000-uffff 
` 
[Accept ` Reject 
` 
State 4 | Access string: "u^4%" Discriminator: "^4_" 
0 ø 0 u0000-uffff 
1 Ø 1 u0000-ufffF 
2 Ø 2 u0000-ufffF 
3 u0000-/; :-uffff 3 0-9 
4 u0000-ufffF 4 ø 
\ 
ps ‘Reject 
4 
Discriminator: "_" Discriminator: "4_" 
0 | 0-9; A-Z; _; a-z 0 g 
1 | 0-9; A-Z; _; a-z 1 -/; 
2 | 0-9; A-Z; _; a-z 2 | u0000-/; :-@; [-^; `; {-ufffF 
3 0-9 a Ø 
4 (2) 4 ø 
1 % 
‘Accept Reject ccept S Reject 
State 3 | Access string: "u^4" State 1 | Access string: "u" State 2 Access string: "u^" State 0 Access string: € 
0 Ø 0 0 @ 0 | u0000-/; :-@; [-^; `; {-uffff 
1 Ø 1 1 | u0000-/; :-@; [-^; `; {-uffff 1 @ 
2 0-9 2 2 @ 2 | u0000-/; :-@; [-^; `; {-uffff 
3 Ø 3 3 Ø $ @ 
4 ø 4 4 ø 4 Ø 


Fig. 7. Decorated version of the discrimination tree in Figure 5. 
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Induced SFA and number of equivalence queries. Notice that, by con- 
struction, for every node n with accept child na and reject child n,, if 7}, i css 
and 7! are the predicates the three nodes associate with a leaf I, 7}, 

and Te A Tha = L. As a consequence, after all the maps decorating a node in 
the discrimination tree are completed, the predicates in the leaves represent the 
complete transition relation of the induced SFA. Further more, the maps grows 
monotonically through the learning process, with entries computed in previous 
iterations remaining valid throughout the entire process. Practically, after each 
split operation resulting from the counterexample of an equivalence query (see 
Section 3.2), we traverse the discrimination tree and incrementally update all 
the predicate maps to include information about transitions to the new leaf, as 
well as populating the maps of the new internal node and new leaf added by the 
split. 

Differently from the original algorithm MAT™* described in Section 3.2, a 
counterexample for the induced SFA corresponding to a decorated discrimina- 
tion tree can only be returned if a new state has been discovered. This bound 
the number of equivalence query to be at most equal to the minimum number 
of states needed to represent the target language as an SFA. In our settings, a 
complete equivalence oracle is not available for the target program P. Equiv- 
alence queries are instead solved using a (input bounded) concolic execution 
that compares the hypothesis SFA (induced by the discrimination tree) with the 
original program. Because this execution is computationally expensive, reduc- 
ing the number of necessary equivalence queries has a significant impact on the 
execution time (at the cost of keeping in memory the node predicate maps). 


l _ 
VTn, = Tn 


5 Experimental evaluation 
5.1 Experimental Setup 


In this section we evaluate a prototype implementation of our contributions, 
built upon SVPAlib [9] (the symbolic automata and alphabet theory library 
used by MAT*) and Coastal [12], a concolic execution engine for Java bytecode. 
in Section 5.2 we consider our approach of using symbolic observations tables 
from Section 4.1 (referred to as SYMLEARN in the following presentation) and 
in Section 5.3 we evaluate the use of the symbolic membership queries from 
Section 4.2 (referred to as MAT*++). All the experiments have been executed 
on a server equipped with an AMD EPYC 7401P 24-Core CPU and 440Gb 
of memory. Coastal was configured to use at most 3 threads using its default 
generational exploration strategy [12] to find counterexamples for equivalence 
queries. 

The experiments in this section are based on regular expressions taken from 
the AutomatArk [8] benchmark suite. To ensure a uniform difficulty distribution 
among the experiments, the regular expressions were converted to their automa- 
ton representation, sorted by the number of states, and 200 target automata 
selected by a stratified sampling (maximum number of states in an automaton is 
637 and average 33; maximum number of transitions 2, 495 and average 96). Each 
automaton is then translated into a Java program accepting the same language 
and compiled. The program analysis is performed on the resulting bytecode. 
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In the first experiment, we demonstrate the increase in query efficiency we 
achieve, by comparing the number of queries, using a complete oracle that can 
answer equivalence queries in a negligible amount of time. In this idealised setup 
the learner halted when the correct automaton was identified, relying on the 
fact that the oracle can confirm the correctness of the hypothesis. Although this 
setup does not represent a realistic scenario, it allowed us to reliably evaluate 
the number of queries of each type that are required to converge to the cor- 
rect automaton, and to measure the computational requirements of the learning 
algorithm in isolation. 

In the second experiment, we demonstrate the use of a concolic engine as 
a symbolic oracle, and measure the impact on the execution time of the algo- 
rithms. Providing a meaningful evaluation of the cost of the equivalence queries 
is difficult, as it is essentially a software verification problem over arbitrary Java 
code, and in principle an equivalence query could never terminate. Instead, a 
complete concolic analysis of each parser is performed, without using the perfect 
oracle for any type of query, and enforcing a timeout of ten minutes for each 
analysis, after which the learner yielded it’s latest hypothesis. The correctness of 
that hypothesis is then confirmed by comparison to the known target automata. 
Note also that we use an input string length limit of 30 for the words to be 
parsed during concolic execution. 

5.2 Learning with symbolic observation tables 


Evaluating the algorithm with a perfect oracle. We learn 78% of the target 
languages within the ten minute timeout using a perfect oracle for equivalence. 
We see a 54% reduction in the total number of membership queries, and a 88% 
reduction in the total number of equivalence queries over MAT* (see Table 1). 


Table 1. Number of queries and execution time with perfect oracle. 


Algorithm Membership queries Equivalence queries Execution time (s) 


MAT* 1, 545, 255 25, 802 38.60 
SYMLEARN 720, 658 3, 124 1321.70 


The SYMLEARN approach requires the path conditions to be stored in the 
observation table, even when using a perfect oracle for equivalence. In order 
to achieve this, concrete counter examples from the perfect oracle are resolved 
to path conditions via the concolic engine. The slower execution time can be 
attributed to the infrastructure overhead present in our implementation, and 
the speed of the concolic engine when performing these resolutions. 
Evaluating the algorithm with the concolic oracle. We now replace the 
perfect equivalence oracle with a concolic execution engine, as described in Sec- 
tion 4.1. We learn 30% of the target automata within the ten minute timeout. 
The execution time is orders of magnitude slower when compared MAT™*, and in 
our implementation 99% of the learner’s execution time is spent running symbolic 
equivalence queries. While the increase in bandwidth due to the path conditions 
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returned for each query does result in a significant reduction of queries overall, 
the execution time of the SYMLEARN approach is orders of magnitude slower 
than MAT*, partly because SYMLEARN requires the actual (concolic) execution 
of the program implementation, instead of performing queries on an SFA repre- 
sentation of the regular expression. There are however a number of optimizations 
that can be made to improve the performance (some of which will be discussed 
in the following section). 


5.3 Learning with symbolic membership queries 


Under the assumption that the language to be learned is regular, and that the 
equivalence check will eventually find a counterexample if there exists one, our 
active learning approach guarantees that eventually the correct hypothesis will 
be generated. The experimental evaluation was therefore aimed at understanding 
what is achievable in a realistic setting, with constrained time, and how our 
methodology improves the outcome. 


Table 2. Number of queries and execution time with perfect oracle. 


Membership Equivalence Learner 
: SMQ ; : > 
queries queries execution time 
MAT* 3,517,474 = 47,374 137.51 s 
MAT*++ 42,075 81,401 1,913 1.33 s 


Table 2 shows the total number of queries? necessary to learn the correct 


automaton over the 200 test cases, along with the CPU time used by the learner 
process alone, without considering the time required to answer the queries. 
The decrease in the CPU time required by the learner process can be ex- 
plained by the reduction in the number of counterexamples that the learner has 
to process (recall that in MAT*++, a counterexample can be caused only by a 
missing state in the hypothesis, while in MAT* it can be also be due to an incor- 
rect transition predicate). To understand the balance between the benefit due to 
the sharp reduction in the number of membership and equivalence queries, and 
the cost due to the introduction of the symbolic queries, the next section will 
evaluate the cost of answering each type of query without the help of a perfect 
oracle. 
Evaluating the impact of SMQ. First, observe that the impact of member- 
ship queries are negligible since it is simply a check to see if an input is accepted. 
However, measuring the complexity of the symbolic membership queries (SMQs) 
is crucial to assess the effectiveness of our approach. Answering a SMQ requires 
the concolic execution of the program under analysis potentially multiple times, 
and requires processing each resulting path condition to collect the information 
needed to refine the answer. In this experiment we measured the time and the 


3 Note that all 200 automata are included in Table 2 whereas only the results for the 
subset that finished before the timeout was shown in Table 1. 
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number of concolic executions required to answer all the SMQs of table 2. The 
total time required to answer the queries was 4,408s, with an average of 54.15 
ms per query. The number of concolic executions per query was between 1 and 
31, with an average of 3.45. Since the concolic execution requires the program 
under analysis to be instrumented and a symbolic state to be maintained, it is 
orders of magnitude slower than a standard concrete execution. 

Evaluating the impact of equivalence queries. Each equivalence query is 
answered in the same way as in the SYMLEARN approach (see Section 4.1), by 
doing a concolic execution of the hypothesis and the program being analyzed 
on the same symbolic input to see if they give a different result; if so, we have 
a counter-example, otherwise we simply know none could be found before the 
timeout or within the input string length of 30. As a further optimization, we also 
maintained two automata knownAccept and knownReject that were the union of 
the automata translation of the path conditions of all the previously explored 
accepted and rejected inputs respectively. 

In this experiment 1,207 equivalence queries were issued, and on average it 
took 56.92s to answer a query. 573 answers were generated in negligible time us- 
ing the knownAccept and knownReject automata (demonstrating the usefulness 
of this optimization), 93 cases Coastal could not find a counter-example (within 
the input size limit), 107 the timeout occurred and in the rest a counter-example 
was found by Coastal. In 152 cases the correct automaton was learned (76%), 
and interestingly in 62 of these cases Coastal timed-out (but the current hypoth- 
esis at the time was in fact correct). In 3 of the cases Coastal finished exploring 
the complete state-space up to the 30 input before the timeout, but the correct 
automaton was not learned. This happened because a counter-example requiring 
more than 30 input characters exist. 


Discussion of the results. The benefit of the symbolic membership queries 
is clear: it reduces the number of equivalence queries by 96%, and the latter 
is by far the most expensive step in active learning without a perfect oracle. 
Furthermore, simple engineering optimizations, for example a caching scheme 
for the accepted and rejected path conditions, can have a significant impact on 
the execution time. 


6 Related work 


The problem of learning input grammars has been tackled using a variety of 
techniques, and with various specific goals in mind. 


6.1 Active learning 


The active learning algorithms most closely related to our are A* [11] and 
MAT™ [3], which have been extensively discussed in Section 3. 

Argyros et al. [4] used Angluin-style active learning of symbolic automata 
for the analysis of finite state string sanitizers and filters. Being focused on 
security, their goal was not to learn exactly the filter under analysis, but to verify 
that it filters every potentially dangerous string. In the proposed approach each 
equivalence query is approximated with a single membership query, which is a 
string that is not filtered by the current hypothesis, but belongs to the given 
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language of “dangerous” strings. If no such string exists, the filter is considered 
successfully validated. If the string exists but is successfully filtered it provides 
a counterexample with which the hypothesis is refined, otherwise a vulnerability 
in the filter has been found. This equivalence approximation is incomplete, but 
greatly simplifies the problem, considering the complexity of equivalence queries. 

Multiple other approaches use different active learning techniques not based 
on L* that, compared to our solution, provide less theoretical guarantees and 
often rely on a corpus of valid inputs. Glade [6] generates a context free grammar 
starting from a set of seed inputs that the learner attempts to generalize, using 
a membership oracle to check whether the generalization is correct. No other 
information is derived from the execution of the program under analysis, and 
therefore the set of seed inputs is of crucial importance. Reinam [29] further 
extends Glade by using symbolic execution to automatically generate the seed 
inputs, and adding a second probabilistic grammar refinement phase in which 
reinforcement learning is used to select the generalization rules to be applied. 

The approach proposed by Héschele et al. [19] uses a corpus of valid inputs 
and applies generalizations that are verified with a membership oracle. Dynamic 
taint analysis is used to track the flow of the various fragments of the input 
during the execution, extracting additional information that aids in the creation 
of the hypothesis and generates meaningful non-terminal symbol names. A sim- 
ilar approach is used by Gopinath et al. [16], with the addition of automatic 
generation of the initial corpus. 


6.2 Passive learning 


Godefroid et al. [15] use recurrent neural networks and a corpus of sample 
inputs to create a generative model of the input language. This approach does 
not learn any information from the system under test, so the sample corpus is 
important. 

A completely different approach is used by Lin et al. [24] to tackle a related 
problem: reconstructing the syntax tree of arbitrary inputs. The technique is 
based on the analysis of an execution trace generated by an instrumented version 
of the program under analysis. This approach relies on the knowledge of the 
internal mechanisms used by different types of parsers to generate the syntax 
tree. 

Tupni [7] is a tool to reverse engineer input formats and protocols. Starting 
from one or more seed inputs, it analyzes the parser execution trace together with 
data flow information generated using taint analysis, identifies the structure of 
the input format (how the data is segmented in fields of various types, aggregated 
in records, and some constraints that must be satisfied), and generates a context 
free grammar. 


7 Conclusions 


Most established active learning algorithms for (symbolic) finite state au- 
tomata assume the availability of a minimal adequate teacher, which includes 
a complete equivalence oracle to produce counterexample disproving an incor- 
rect hypothesis of the learner. This assumption is unrealistic when learning the 
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input grammar of a program from its implementation, as such a complete or- 
acle would need to automatically check the equivalence of the hypothesis with 
arbitrary software code. In this paper, we explored how the use of a concolic 
execution engine can improve the information efficiency of membership queries, 
provide a partial input-bounded oracle to check the equivalence of an hypothesis 
against a program, and enable the definition of a new class of symbolic mem- 
bership queries that allow the learner inferring the transition predicates of a 
symbolic finite state automata representation of the target input language more 
efficiently. 

Preliminary experiments with the Autmatark [8] benchmark showed that 
our implementations of SYMLEARN and MAT*++ achieve a significant reduc- 
tion (up to 96%) in the number of queries required to actively learn the input 
language of a program in the form of a symbolic finite state automaton. Despite 
bounding the total execution time to 10 minutes, using the concolic execution 
engine as partial equivalence oracle, MAT*++ managed to learn the correct 
input language in 76% of the cases. 

This results demonstrate the suitability of concolic execution as enabling 
tool for the definition of active learning algorithms for the input grammar of a 
program. However, our current solutions learn the input grammar in the form 
of a symbolic finite state automaton. This implies that only an approximation 
of non-regular input languages can be constructed. Such approximation can at 
best match the input language up to a finite length, but would fail in recognizing 
more sophisticated language features that may require, for example, a context 
free representation. Investigating how the learning strategies based on concolic 
execution we explored in this paper can generalize to more expressive language 
models is envisioned as a future direction for this research, as well as the use of 
the inferred input languages to support parsers validation and grammar-based 
fuzzing. 
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Abstract. Alloy is a lightweight specification language based on relational logic, 
with an analysis engine that relies on SAT solvers to automate bounded verifica- 
tion of specifications. In spite of its strengths, the reliance of the Alloy Analyzer 
on computationally heavy solvers means that it can take a significant amount of 
time to verify software properties, even within limited bounds. This challenge 
is exacerbated by the ever-evolving nature of complex software systems. This 
paper presents PLATINUM, a technique for efficient analysis of evolving Alloy 
specifications, that recognizes opportunities for constraint reduction and reuse of 
previously identified constraint solutions. The insight behind PLATINUM is that 
formula constraints recur often during the analysis of a single specification and 
across its revisions, and constraint solutions can be reused over sequences of anal- 
yses performed on evolving specifications. Our empirical results show that PLAT- 
INUM substantially reduces (by 66.4% on average) the analysis time required on 
specifications extracted from real-world software systems. 


1 Introduction 


The growing reliance of society on software and software-intensive systems drives 
a continued demand for increased software dependability. Software verification pro- 
vides the highest degree of software assurance, with its strengths residing in the math- 
ematical concepts that can be leveraged to prove correctness with respect to specific 
properties. Most notably, bounded verification techniques, such as Alloy [28], have 
recently received a great deal of attention in the software engineering community 
(e.g., [8, 9, 11, 13, 14, 16, 20, 26, 34, 35, 38, 43, 46, 48, 52, 54, 55, 61, 63, 66]), 
due to the strength of their automated, yet formally precise, analysis capabilities. The 
basic idea behind these techniques is to construct a formula that encodes the behavior 
of a system and examine it up to a user-specified bound. They thus enable analyses of 
partial models that represent key aspects of a system. 

Bounded verification techniques often transform a software specification to be an- 
alyzed into a satisfiability problem, and delegate the task of solving this to a con- 
straint solver. In the past decade, constraint solving technologies have made spectacular 
progress (e.g., [19, 22, 42]). Despite these advances, however, constraint solving con- 
tinues to be a bottleneck in analyses that rely on it [58]. This is because the magnitude 
of formulas tends to increase exponentially with the size of the system being analyzed, 
making it impractical to employ constraint solving on complex systems. Further, de- 
spite the many optimizations applied to constraint solvers, they are still unable to detect 
many instances of subformula recurrence that are generated by Alloy. 
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The foregoing challenges are exacerbated when considering the ever-evolving na- 
ture of complex software systems and their corresponding specifications. Formal speci- 
fications are developed iteratively, and each iteration involves repeated runs of the ana- 
lyzer for assessment of their semantics [31, 36]. In online analyses, where specifications 
are kept in sync with the evolving software and analyses are performed at runtime, the 
time required to verify the properties of software is of even greater significance. This 
calls for techniques that assist constraint solvers in dealing with large corpora of for- 
mulas, many of which contain tens of thousands of clauses. 

In this paper, we introduce PLATINUM, an extension of the Alloy Analyzer that sup- 
ports efficient analysis of evolving Alloy specifications, by recognizing opportunities 
for constraint reduction and reuse of previously identified constraint solutions. Unlike 
the Alloy Analyzer and its other variants, e.g., Aluminum [45], that dispose of prior 
results in response to changes in the system specification, PLATINUM stores solved 
constraints incrementally, and retrieves them when they are needed again within the 
analysis of the revised specification. PLATINUM further improves analysis efficiency by 
omitting redundant constraints from a specification before translating them into propo- 
sitional formulas to be solved by expensive constraint solvers, thereby greatly reducing 
the required computational effort. Although techniques for storing the results of satisfi- 
ability checking and reusing them later have been considered in the context of symbolic 
execution [6, 7, 29, 49, 62], these techniques cannot be directly applied to Alloy due 
to the specifics of its core logic, which consolidates the quantifiers of first-order logic 
with the operators of the relational calculus [28]. (Section 5 provides details.) 

We evaluate the performance of PLATINUM in several scenarios. First, we apply 
PLATINUM to several pairs of specifications in which the second contains a small but 
non-trivial set of changes relative to the first. Second, we apply PLATINUM to several 
sequences of specifications that model evolution scenarios. Our empirical results show 
that PLATINUM is able to support reuse of constraint solutions both within a single 
analysis run and across a sequence of analyses of evolving specifications, while achiev- 
ing speed-up over the Alloy Analyzer. Third, we show that as the scope of the analysis 
increases, PLATINUM achieves even greater improvements. Fourth, we show that the 
overhead associated with PLATINUM is a fraction of that required by the Alloy An- 
alyzer. Finally, we show that PLATINUM substantially reduces (by 66.4% on average) 
the analysis time required on specifications extracted from real-world software systems. 

This paper makes the following contributions: 


— Efficient analysis of evolving relational logic specifications. We present a novel ap- 
proach to improve the bounded analysis of relational logic specifications by trans- 
forming constraints into more concise forms and enabling substantial reuse of so- 
lutions, which in turn substantially reduces analysis costs. 

— Tool implementation. We implement PLATINUM as an extension to Alloy and its 
underlying relational logic analyzer, Kodkod [57]. We make PLATINUM available 
to the research and education community [5]. 

— Empirical evaluation. We evaluate PLATINUM in the context of Alloy specifications 
found in prior work and specifications automatically extracted from real-world sys- 
tems, corroborating PLATINUM’s ability to substantially outperform the Alloy An- 
alyzer without sacrificing soundness or completeness. 
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2 Illustrative Example 
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To motivate this research and illustrate our approach, we provide a simple Alloy spec- 
ification and describe the analysis process followed by the Alloy Analyzer and PLAT- 


INUM. 

Consider snippets of the Alloy speci- 
fication for a simple customer-order class 
diagram, shown in Listing 1.1 (adapted 
from [15]). Each Alloy specification con- 
sists of data types and formulas that de- 
fine constraints over those data types. A 
signature (sig) paragraph introduces a ba- 
sic data type and a set of its relations, 
called fields, accompanied by the type 
of each field. The running example de- 
fines seven signatures (Lines 2-21). The 
Customer class (Lines 2-7) has two at- 
tributes, customerlD and customerName, 
that are assigned to the attrSet field of the 
Customer class. The id field specifies that 
customerlD is the identifier of this class. 
The last two lines of the Customer signa- 
ture specification indicate that Customer 
is not an abstract class and that it has no 
parent. Similarly, the code in Lines 10- 
15 represents the Order signature spec- 
ification, and CustOrder (Lines 18-21) 
specifies an association relationship be- 
tween Customer and Order. 

Facts (fact) are formulas that take 
no arguments, and define constraints that 
each instance of a specification must sat- 
isfy, restricting the specification’s solu- 


// (a) a simple customer-order class diagram 
one sig Customer extends Class{}{ 
attrSet = customerlD +customerName 
id=customerlD 
isAbstract = No 
no parent 


one sig customerlD extends Integer{} 
one sig customerName extends string{} 
one sig Order extends Class{}{ 

11 attrSet = orderlD + orderValue 
id=orderlD 
13 isAbstract 
no parent 


= No 


15: } 

one sig 
one sig 
18 one sig 
src = 
dst = 


orderlD extends Integer{} 
orderValue extends Real{} 
CustOrder extends Association{}{ 
Customer 
Order 
21 } 
fact associationMultiplicity{ 

one CustOrder.src and some CustOrder.dst 
24 } 


1 //(b) new constructs added to the revised specification 
2 one sig PreferredCustomer extends Class{}{ 
3 attrSet = discount 

4 one parent 

5 parent in Customer 

6 isAbstract = No 

T id=customerID 

8 } 

9 


one sig discount extends Integer{} 


Listing 1.1: (a) a specification describing 
a simple customer order class diagram; 
(b) new constructs added to a revised 
version of that specification. 


tion space. The formulas can be further structured using predicates (pred) and functions 
(fun), which are parameterized formulas that can be invoked. The associationMultiplicity 
fact paragraph (Lines 22—24) states multiplicities of source and destination classes in 


the CustOrder association relationship. 


To analyze such a relational {01.01} 
specification, both the Alloy An- 3 Customer: (1,1)::1{ <CI>}{<C1>}] 
alyzer and PLATINUM translate it g ene Y i aa 
into a corresponding finite rela- 6 j "{£C1,C1>,<C1,O15,<01,C1>,<01,01>}] 
tional model in a language called 7 
8 (no Customer.parent) && (no Order. parent) 


Kodkod [56]. Listing 1.2 shows a 


partial Kodkod translation of List- 
ing 1.1(a). A specification in Kod- 
kod’s relational logic is a triple 


Listing 1.2: Kodkod representation of the Alloy 
specification of Listing 1.1 (partially elided for 
space and readability). 
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consisting of a universe of elements (a.k.a. atoms), a set of relation declarations in- 
cluding lower and upper bounds specified over the model’s universe, and a relational 
formula in which the declared relations appear as free variables [56]. 

The first line of Listing 1.2 declares a universe of two uninterpreted atoms. (Due to 
space limitations, the listing omits some of the relations and atoms.) While in Kodkod 
all relations are untyped, in the interest of readability we assume an interpretation of 
atoms in which C1 represents a Customer element and O1 represents an Order element. 

Lines 3-6 of Listing 1.2 declare relational variables. Similar to Alloy, formulas in 
Kodkod are constraints defined over relational variables. Whereas in Alloy these rela- 
tional variables are separated into signatures that represent unary relations establishing 
a type system, and fields that represent non-unary relations, in Kodkod all relations are 
untyped, with no difference made between unary and non-unary variables. 

Kodkod also allows scope to be 
specified from above and below each 
relational variable by two relational 
constants; these sets are called upper 
and lower bounds, respectively. In prin- 
ciple, a relational constant is a pre- 
specified set of tuples drawn from a 
universe of atoms. Each relation in a 
specification solution must contain all 
tuples that appear in the lower bound, 
and no tuple that does not appear in the upper bound. That is, the upper bound repre- 
sents the entire set of tuples that a relational variable may contain, and the lower bound 
represents a partial solution for a specification. 


(1(vt]v2)) &(Iv2 | !v4) &(1(v3]v4)) &(1v4 | v3) 


Slices: 
(!(v1]v2)) &(!v2|!v1) 
(!(v3|v4)) &(!v4 | !v3) 


Canonical form: 
(1(1| 2) &(!2]11)) 


CAINDMPWNH 


Listing 1.3: Excerpt of the boolean 
encoding for the Kodkod specification 
shown in Listing 1.2. 


Consider the Customer declaration (Listing 1.2, Line 3). Both its upper and lower 
bounds contain just one atom, C1, given that it is defined as a singleton set in List- 
ing 1.1. The upper bound for the variable parent C Class x Class (Lines 5—6) is a prod- 
uct of the upper bound set for its corresponding domain and co-domain relations, here 
(Customer U Order) —> (Customer U Order), taking every combination of an element 
from both and concatenating them. 

To transform such a finite relational model into a boolean logic formula, Kodkod 
renders each relation as a boolean matrix, in which any tuple in the upper bound of 
the given relation that is not in the lower bound maps to a unique boolean vari- 
able [56]. Relational constraints are then captured as boolean constraints over the trans- 
lated boolean variables. 

To render this idea concrete, consider the parent relation along with the next con- 
straint defined over it (Listing 1.2, Lines 5-8). Each of the four tuples in the upper bound 
of the parent relation is allocated a fresh boolean variable (v1 to v4) in the boolean en- 
coding. The relational constraint (no Customer.parent) && (no Order.parent) is then 
translated as a boolean constraint over those boolean variables, as shown in List- 
ing 1.3, Line 1. 

Expressions and constraints in relational specifications typically contain equivalent 
slices in their boolean representations. PLATINUM detects such semantically redundant 
slices by refining the specification in its boolean logic form into its essential, indepen- 
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dently analyzable slices, and then rendering them in a canonical form. The boolean 
encoding of the constraints defined over the parent relation, for example, embodies two 
slices with equivalent but syntactically distinct formulas (Listing 1.3, Lines 4—5). Line 8 
represents the result of restructuring the slices into a canonical form, suggesting that the 
two slices are in fact equivalent. The slicing technique we use to determine the sets of 
clauses, the satisfiability of which can be analyzed independent of other clauses in the 
formula, is presented in Section 3. 

PLATINUM prevents redundant slices from being propagated to the CNF formula to 
be solved by the underlying SAT solver, substantially reducing computational effort. In 
the case of our example specification (Listing 1.1(a)), PLATINUM partitions the original 
relational specification into 30 slices, with only seven distinct canonical slices. As such, 
PLATINUM is faster at finding a solution instance, requiring 19 ms to do so compared 
to the 26 ms that the Alloy Analyzer requires to produce the first solution instance. The 
time required to compute the entire instance set also improves, from 6481 ms to 246 
ms, in this simple example. 

PLATINUM also reuses results produced for specification slices to further improve 
the analysis of evolving specifications. Consider Listing 1.1(b), for example, in which 
two new signature paragraphs are added, stating that the PreferredCustomer class in- 
herits from the Customer class. Given the updated specification, PLATINUM reuses 
the results from the prior run and solves a smaller problem. Specifically, after slicing 
and canonicalizing the formula, the results for 29 slices, out of the total of 30 slices, 
are already available. As a result, PLATINUM requires only one millisecond to find 
the first solution for the revised specification, whereas the Alloy Analyzer requires 
27 milliseconds to produce the first solution. PLATINUM also produces speed-ups in 
computing the whole solution space. In the case of this particular example, PLAT- 
INUM reduces the time required to produce the entire solution set from 768 millisec- 
onds to two milliseconds. 


3 Approach 


Fig. 1 provides an architectural overview that shows how PLATINUM fits in with Al- 
loy. As the figure shows (left), the Alloy Analyzer reads in an Alloy specification and 
translates it into a relational model, then passes that to Kodkod. Kodkod translates the 
relational model into a boolean formula, then to CNF, and passes the CNF to off-the- 
shelf SAT solvers to obtain a solution. Last, the Alloy interpreter translates the SAT 
result into a solution instance. 

PLATINUM is inserted between Kodkod and the Alloy interpreter, as shown in the 
figure. At the highest level, PLATINUM takes in the boolean formula from Kodkod and 
outputs SAT results to the Alloy interpreter. The box at right shows the steps PLATINUM 
follows to do this. PLATINUM first decomposes the boolean formula into independent 
slices. Then, for each slice, PLATINUM canonicalizes it into a normalized format and 
searches the storage for a previously existing equivalent slice. If such a slice exists, the 
previous results will be reused. Otherwise, the slice is translated to CNF and assigned to 
an independent SAT solver for processing. Both the slice and the results of processing it 
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Fig. 1: Overview of Alloy and PLATINUM 


are then stored. Finally, PLATINUM combines the results for each slice and passes them 
to the Alloy interpreter. 
Next, we describe each step taken by PLATINUM in detail. 


3.1 Slicing 


In PLATINUM, the slicing operation takes in the 7 = 
Algorithm 1 Slicing 

boolean formula generated from Kodkod and decom- : = 

oe Å Require: f: original Boolean Formula root 
poses it into a set of independently analyzable sub- Ensure: Stices: Set of Independent Slices 
formulas. Formally, given a boolean formula q, slic- 1: aa pee ra ) 
. FAE ices <— nu 
ing decomposes it into subformulas Q1, 2, ...,Q,, such for each variable v € f do 
that the following equations hold: 


parent[v] + v 


FINRA 


rank|v] + v 
end for 
— PAPA. AMm= E DECOMPOSE(f) 
= var(@,) U var(@2) Wel var(®n) = var(@) 8: end procedure 
— var(@i) N var(®j) = 0, for each @; and 9; where 9. procedure DECOMPOSE(f) 
i A j i if f.operator = AND then 
. n for each subfi la fie fd 
— var(9;) £90, fori =1,2,...,n 12: OET Ef do 
13: end for 
where var(@) is the set of boolean variables of ọ. 14: else 
: 15: UNION-FIND(f) 
Subformulas Q; to Ọ can be solved independently. 16: endif 


Thus, @ is satisfiable if and only if each slice @; is sat- 17: end procedure 
isfiable individually. 

A boolean formula can be sliced either logically (based on semantics) or alge- 
braically (based on syntax). In the interest of efficiency, PLATINUM applies a syn- 
tactic slicing algorithm. There are two types of boolean formulas in Alloy: a propo- 
sitional formula that Kodkod translates from the relational model and the conjunctive 
normal form generated from the propositional formula. PLATINUM applies slicing on 
the propositional formula level for two reasons. First, translating a propositional for- 
mula to CNF introduces many auxiliary variables [21]. For example, when the Cus- 
tomerOrder specification in Section 2, with 81 variables in its propositional formula, is 
translated to a CNF formula containing 352 variables, 271 auxiliary variables are in- 
troduced. The explosion in the number of variables affects the performance of slicing 
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and canonicalization. Second, in certain cases, auxiliary variables connect two inde- 
pendent formulas together. Given the boolean formula vı &v2, its CNF translation is 
(v1 |!0)&(v2|!0)&(1v1|!v2|0), where o is the auxiliary variable. Even if vı and v2 are 
independent formulas, in the CNF, vı and v2 are dependent on each other. 


Slicing can be viewed as iden- 
tifying connected components in a 
graph, where the vertices of the 
graph are boolean variables and the 
edges of the graph represent two 
variables that appear within the same 
clause. Each slice is thus one con- 
nected component in the graph. The 
conventional way to proceed with 
this is to first build a graph for the 
entire boolean formula, and then run 
a depth-first-search (DFS) to iden- 
tify each connected component [62]. 
For large specifications this can be 
both time and memory intensive. 
To improve performance, our algo- 
rithm applies a modified UNION- 
FIND algorithm [17], that traverses 
the boolean formula only once to 
identify connected components. 


Algorithm 1 outlines the slic- 
ing process. Given boolean formula 
root, the algorithm first initializes a 
data structure used by its subrou- 
tine (Lines 2—6). Each slice is iden- 
tified by a representative, which is 
one variable within the slice. Array 
Parent is used to find the represen- 
tative variable. Array Rank is used 
to construct a balanced parent ar- 
ray. Array Slices maps a represen- 
tative variable to its corresponding 
slice; its size equals the number of 
slices. The algorithm then calls sub- 
routine DECOMPOSE to decompose 
the root formula. 


Algorithm 2 Union-Find 


1: procedure UNION-FIND(f) 


43: 


SP PADARWY 


represent + null 
for each variable v € f do 
if v has been visited then 


if UnMeetState then 
represent + FINDSLICE(v) 
add f to Slices{represent| 
change to MeetState 
else 
if FINDSLICE(v) != FINDSLICE(represent) 


UNIONSLICES(Slices[represent],Slices[v]) 
end if 
end if 


else 


UNIONVARS(v, represent) 
v.visited — TRUE 


end if 
end for 
: end procedure 


: procedure UNION VARS(v,represent) 
if represent is null then 
represent + FINDSLICE(v) 


Parent [represent] + FINDSLICE(v) 
Rank{[re present] + Rank{re present] + 1 
: end procedure 


: procedure UNIONSLICES(represent,v) 

v + FindSlice(v) 

if Rank[represent] < Rank[v] then 
Slices|v].add (Slices|represent]) 
Parent [represent] + v 
Rank{v| + Rank{re present] + Rank{v] 


Slices{represent].add(Slices|v]) 
Parent |v] < represent 
Rank({represent| + Rank{re present] + Rank{v] 


: end procedure 


: procedure FINDSLICE(v) 
while v != Parent[v] do 

v + Parent(v] 

Parent|v] + Parent [Parent [v]] 
end whilereturn v 


44: end procedure 


DECOMPOSE recursively partitions a boolean formula f into subformulas in such 
a way that the conjunction of all subformulas equals f, and each subformula cannot be 


decomposed into smaller subformulas. 


The UNION-FIND procedure (Algorithm 2) takes a decomposed subformula and 
finds a slice to which it belongs. The basic idea behind the algorithm is that each slice is 
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represented by one variable. UNION-FIND has two basic operators: UNION and FIND. 
If UNION operates on two slices, it joins them into one slice (Lines 27-38). If UNION 
operates on two variables, it assigns one variable to be the parent of the other (Lines 20- 
26). The FINDSLICE operation determines the representative variable for the slice — the 
variable to which the input variable belongs. It does so by traversing the Parent array 
until it finds one variable vp whose parent is itself, i.e., parent[v,] = vp. All variables 
along this path belong to the same slice and are represented by vp. 

The input boolean formula has two states: UnMeetState, which indicates that f does 
not belong to any slice yet, and MeetState, which indicates that f belongs to some slice 
that is represented by represent. For each variable v of the input boolean formula f, 
UnMeetState first obtains the representative variable for v (which could be itself if v 
does not belong to any slice yet). If v has not been visited, the algorithm unions v and 
the representative variable of the subformula (Lines 20-26). Otherwise, if v has been 
visited (i.e., it belongs to some slice), and if f is in UnMeetState, then the algorithm 
adds f to the slice represented by represent. Finally, if f is in MeetState, this means that 
f belongs to one slice and v belongs to another and these need to be joined together 
(Lines 27-38). 


3.2 Canonicalization 


The time complexity of the UNION- Algorithm 3 Canonicalization 
FIND algorithm is near linear [17]. With- Require: f : boolean formulas 

out this improvement and using the con- Ensure: //: canonical boolean formula 
ventional DFS-based approach taken by 4: a pas aaa 
Green [62] among others, in one case in varSet + sort(varSet) 

our empirical study, a few minutes were roe ie oes sah ub i) 
required to produce independently ana- varSet(i].label + i l 
lyzable slices. Using our algorithm, this Fee incl 

time was reduced to about 10 millisec- for each subformula sf € f do 
onds — an order of magnitude speedup. RENEMEGH 

This speedup occurs for the following 

reason. A graph is needed to start the 

DFS. The graph contains information 14: procedure RENAMEC) 


— = 
DE SLSIAMEwY 


end for 
fief 

f : : 15: for each subformula sf € f d 

about which variable belongs to which a rages ENAMEL) rs 


13: end procedure 

clause and which clause contains which 17: end for 
variables, and a map-like data structure ie Toa SERR 
is needed to store this information. When 20: end procedure 
the number of variables becomes huge— 
typically hundreds of thousands in formulas produced for Alloy specifications of real- 
world systems—it is time and memory consuming to obtain this information and store 
it. It is also time consuming to retrieve the graph information during the DFS. Our 
UNION-FIND based algorithm, in contrast, requires information only on the node’s 
parents, and this can be placed in a static array that requires only linear time to store 
and retrieve. 

The slices produced by the prior step are passed to this step, which transforms each 
slice into a canonical format in order to capture the syntactic equivalence between dif- 
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ferent slices. For a slice ọ, where Q = ©) A@2 A... A Qn, canonicalization generates one 
boolean formula Q’, such that 9’ = Q} A94 A... AQ, where g’ is the canonical format of 
@. The canonical form of the formulas is constructed by renaming variables and formula 
labels. Algorithm 3 outlines this transformation. 

Canonicalization first renames each boolean variable based on its weight (Lines 2— 
7). For each variable v € V, where V = var(@1) Uvar(@2) U... Uvar(@,), the weight of 
v is calculated as the sum of the number of its occurrences and the number of operators 
applied on v in all of the subformulas. To improve the performance of this step, the 
weight for each variable is collected during the slicing phase; then, V is sorted based 
on variable weight. If two variables have the same weight, their original labels are 
used to sort them. Each variable is then renamed to their index in the sorted array. 
The mapping relations from canonical variables to original variables for each slice are 
stored in labelMap for use in assembling the solution for the original boolean formula. 
Next, the label for each formula is renamed (Lines 8-20). The purpose of this step is 
to maintain consistency with variables when translating to CNF. The labels of formulas 
are used as auxiliary variables when they are translated to CNF. 


3.3 Storing and Reuse 


After slicing and canonicalization have been completed, each boolean formula is de- 
composed into several independent formulas. For each canonicalized boolean formula, 
PLATINUM checks its hash code in storage. If there is a hit, this boolean formula is 
already solved, and the result will then be retrieved. If not, the boolean formula will be 
translated into CNF and solved by the SAT solver independently. The result will then 
be stored. 

After solving all slices, using the labelMap (Algorithm 3) that maps canonical vari- 
ables to original variables, PLATINUM obtains the solution for the original boolean 
formula and passes it to Alloy to generate a solution instance. 


4 Empirical Study 


We empirically evaluated the performance of PLATINUM in relation to the following 
research questions: 


RQ1: How does the performance of PLATINUM compare to the performance of existing 
approaches on specifications that have undergone relatively small amounts of change? 


RQ2: How does the performance of PLATINUM compare to the performance of exist- 
ing approaches on specifications that have gone through several successive rounds of 
evolution? 


RQ3: How does the performance of PLATINUM compare to the performance of existing 
approaches on specifications that have run against higher scopes? 

RQ4: What is the overhead of PLATINUM in restructuring a relational logic formula 
into its canonical form? 

RQS: How does the performance of PLATINUM compare to the performance of Alloy 


Analyzer in practice on specifications automatically extracted from real-world applica- 
tions? 
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4.1 Objects of Analysis 


Our objects of analysis are specifications drawn from Table 1: Objects of Analysis 


a variety of sources and problem domains. These spec- Specification # Rels 
ifications vary widely in terms of size and complexity. Ecommerce 70 
Table 1 lists the specifications that we use, with statis- Decider AT 
tics on their size in terms of the numbers of relations in CSOS 64 
their underlying logic. Note that this number, in turn, Wordpress 54 
represents the sum of the numbers of signatures and Andr. Bundle 1 665 
fields, as both are indeed translated into relations in Andr. Bundle 2 558 
the underlying relational logic. Andr. Bundle 3 485 

Ecommerce is a model, adopted from Lau and Andr. Bundle 4 569 
Czarnecki [30], that represents a common architecture Andr. Bundle5 501 
for open-source and commercial E-commerce sys- Andr. Bundle6 456 


tems. Decider [15] is a model of a system to support 
design space exploration. CSOS is a model of a cyber-social operating system meant 
to help coordinate people and tasks. WordPress is an object model obtained by reverse 
engineering an open-source blog system [3]. Finally, the last six rows of the table cor- 
respond to six large specifications intended for the analysis of security properties in 
the context of the Android platform. Each consists of a bundle of Android applications 
installed on a mobile device for detecting security vulnerabilities that may arise due to 
inter-application communication, adopted from [12]. 

For the first four objects of analysis, we do not have access to actual, modified ver- 
sions of their Alloy specifications, and even if we did, there would not likely be enough 
versions to provide data sufficient to support quantitative analyses. Thus, instead, we 
used a mutation-based approach to create modified versions of the specifications. We 
used edit operations for Alloy specifications [10] and incorporated into the MuAlloy 
mutation system [64] to derive a list of mutation operators. Table 2 provides a list of 
these mutation operators, together with short descriptions. 

To investigate RQI we 
wished to apply our mutation Table 2: Mutation Operators 
operators to create 30 modified 
versions of each of our objects 
of study. Because prior work 
by Li et al. [31] showed that 
users tend to modify Alloy 
specifications incrementally by 
small amounts, we chose to 


Description 


ADS Add anew signature 

DLS Delete a signature without children 

CSM Change the signature multiplicity,’ i.e., to set, 
one, lone or some (one that is different from 
the multiplicity defined in the original specifi- 


cation) 
create versions of our object ABS Makean abstract signature non-abstract or vice 
specifications by mutating versa 


between one and 10% of the MOV Move asub-signature to a new parent signature 
relations in the specifications. ADF Add a new field to a signature declaration 
Given object specification S, for DLF Delete a field from a signature declaration 
each modified specification S’ CFM Change a multiplicity constraint in a field dec- 


of S to be created, we randomly laration 


chose a number N in this range; 
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N denotes the number of mutations to apply to S. We then began randomly choosing 
relations L in S’, then randomly choosing a mutation operator M applicable to L, and 
applied M to S’. We did not allow a given L to be utilized more than once in this 
process. Following each operator application, we ran Alloy on the current version of 
S’ to ensure that it is a valid specification. We repeated this process until N mutations 
had been inserted into S’. Ultimately, this process produced 30 modified versions of 
each object specification, wherein each version contained a randomly selected number 
of randomly selected mutations — a number no greater than 10% of the number of 
relations in the original specification. 

To investigate RQ2 we used a similar process; however, in this case our goal was to 
“evolve” each object specification S iteratively. Given the original version S, we created 
a successor version Sı by repeating the process of inserting a randomly selected number 
of randomly selected mutations (again, a number no greater than 10% of the number of 
relations in S). However, our next iteration applied this same process to Sı (which now 
contains a number of mutations) to produce a version S2 that potentially contains more 
mutations. Here, we say “potentially” because we did not place any restrictions on the 
re-use of mutation operators or mutation locations in subsequent versions S% of S; thus, 
conceivably, a mutation could be “undone” in a subsequent version. We repeated this 
process 30 times on each specification, thereby obtaining a sequence of specifications 
that have evolved iteratively. 

It is common for users of bounded verification techniques such as Alloy to increase 
the scope of the analysis, in order to obtain greater confidence in the validity of the 
specification. As the scope of analysis increases, the space of cases that must be exam- 
ined expands dramatically. To investigate RQ3, we increased the scope of analysis on 
each of our object specifications. Note that the only change in the specification between 
two successive runs of the analyzer in this case was the scope of analysis. 

To investigate RQ4 we used the dataset created for RQ1. To investigate RQ5, we 
created six different app bundles, each containing 20 Android apps drawn from public 
app repositories such as Google Play [2]. We then used the COVERT tool [1] to auto- 
matically extract Alloy specifications from the app bundles. Given an original bundle 
B, we created a successor version B’ by adding a new app or removing an existing app 
(randomly selected) to/from the given bundle. The specifications automatically derived 
from app bundles tend to evolve as apps are added to, or removed from, the bundles. 
The resulting app bundles thus provide us with an ideal suite of evolving specifications 
that can be used for our evaluation. We repeated this process 30 times on each of the 
app bundles to produce 30 modified versions of each bundle specification. 


4.2 Variables and Measures 


Independent Variables As independent variables we wished to utilize PLATINUM, as 
well as baseline techniques representing state of the art approaches capable of perform- 
ing the same function as PLATINUM. 

We consider the Alloy Analyzer (version 4.2) as a baseline technique to compare 
against PLATINUM. The other potential baseline technique is Green [62], an optimiza- 
tion technique that, during symbolic execution, memoizes and reuses the results of satis- 
fiability checking. The current implementation of Green, however, has two fundamental 
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problems in the context of this study. First, while Green supports the use of Integer and 
Real variables in expressions, it does not support the use of boolean variables, which 
are widely used in the context of Alloy’s relational logic. We were able to work around 
this challenge, however, by modeling boolean variables as Green’s Integers and limit- 
ing their size to zero and one — an approach suggested by Green’s developers. A more 
insidious problem, however, is that the Green framework does not currently support 
constraints with the disjunction operator. Because Alloy specifications are in relational 
logic, native support for the disjunction operator is essential to effectively analyze such 
specifications. This issue has been reported to the Green repository [4], and we have 
been in contact with the authors about it; however, to date, the issue has not been re- 
solved and there are no workarounds for it. Thus, we were ultimately unable to use 
Green as a baseline technique. 

Additional independent variables used were (b) the size of specifications in terms 
of relations in the relational logic, (c) the number of mutation operations, (d) the type 
of mutation operations, and (e) the scope of the analysis. 


Dependent Variables We measure several dependent variables. The first variable, anal- 
ysis time, tracks performance directly. Here, we measure the wall clock time required 
to run (1) a complete Alloy analysis and (2) a complete PLATINUM analysis on each 
specification considered. The second variable is the number of unique, independently 
analyzable slices produced by PLATINUM for each specification under analysis. The 
third variable is the number of slices for which solutions are already available for each 
specification under analysis. Finally, the fourth variable is the size of the generated CNF 
formulas that must be solved by the underlying SAT solver. In the last case, we record 
the number of CNF variables and clauses produced by each of the two techniques when 
translating high-level Alloy specifications into SAT formulas. 


4.3 Study Operation 


For RQ1 and RQ3, for each of our specification pairs, we applied the Alloy Analyzer 
and PLATINUM, measuring the time required by each approach, and the number of 
variables and clauses at the SAT level produced by each tool. 

For RQ2, for each of our specification sequences, we applied both the Alloy Ana- 
lyzer and PLATINUM to each pair of successive specifications in the sequence, measur- 
ing, for each iteration, the time required by each approach, the size of the SAT formula 
produced by each tool, and the number of slices reused across sequences. 

For RQ4, for each of our specification pairs, we applied PLATINUM, measuring the 
time required for formula restructuring, including the slicing and canonicalization steps. 

Finally, for RQ5, for each of the specification pairs extracted from app bundles, we 
applied both the Alloy Analyzer and PLATINUM, measuring the time required by each 
approach. 

All of our runs of the Alloy Analyzer and PLATINUM were conducted on an 8-core 
2.0 GHz AMD Opteron 6128 system with 40 GB of memory. Both techniques leveraged 
SAT4J as the SAT solver across the entire study to keep extraneous variables constant. 
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4.4 Threats to Validity 


External validity threats concern the generalization of our findings. We have studied 
ten sets of Alloy specifications and cannot claim that they are representative of all 
such specifications. Additionally, our modified specifications for the first four objects 
of analysis were created via a mutation approach, and while this allows us to obtain 
large amounts of data, these objects may not directly represent modified specifications 
that exist in practice. To reduce this threat and help determine whether our results may 
generalize, we conducted additional studies using real-world software systems, where 
both the Alloy specifications and their revisions are automatically extracted from evolv- 
ing bundles of real Android apps. Finally, different versions of the Alloy Analyzer may 
leverage different translation algorithms to CNF, and this may affect the execution time 
of the analyzer. To reduce this threat we used the latest stable release of the Alloy Ana- 
lyzer, Alloy Analyzer 4.2, for all runs collected in the study. 

Construct validity threats concern our metrics and measures; we are aware of no 
such threats in this case. 


4.5 Results for RQ1 (Small Changes) 


We first assess the effective- 
ness of PLATINUM with respect 
to the incremental changes de- 
rived from our first four ob- 
ject specifications. The boxplots 
in Fig. 2 depict the size of 
the generated CNF formulas, 
given as the number of variables == 
(Fig. 2a) and clauses (Fig. 2b) 
across mutations for each ob- 
ject of study. The results show (b) 
that in comparison to the Alloy E AAG Analypsi 

Analyzer, PLATINUM’s transla- 

tion of relational logic specifi- 
cations results in much smaller 
and simpler SAT formulas, and i 
the numbers of CNF variables — ee ed = 
and clauses generated by PLAT- ° . . l l 
INUM were smaller than the csos Decider Ecommerce Wordpress 


numbers generated by Alloy. _. . i 
Specifically, in the analyses Fig. 2: Sizes of generated CNF formulas in terms of 


of the CSOS. Decider. Ecom- the number of (a) variables and (b) clauses produced 
i by the Alloy Analyzer and PLATINUM across muta- 
tions for each object of study. 


(a) 
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merce, and Wordpress specifica- 
tions, the numbers of variables 
and the numbers of clauses in the formulas produced by PLATINUM on average were 
4.5/2.6/5.1/3.5 and 2.1/1.4/2.0/1.7 times lower, respectively, than the numbers in the 
formulas produced by the Alloy Analyzer. This is because already analyzed slices do 
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not need to be translated into SAT formulas, thus reducing the sizes of the generated 


CNF formulas. 
Table 3 shows the results 


Table 3: Performance Statistics 


of a comparison of the aver- 
age analysis times required by 
the Alloy Analyzer and PLAT- 


Alloy PLATINUM 


INUM across the four objects of 


study. On average, PLATINUM 


exhibited a 67.16% improve- 


ment over the Alloy Analyzer, 
with the average improvement 


Analysis} Analysis 

eS eS % Improvement 
Ecommerce] 280.92 49.69 82.31% 
CSOS 120.64 56.71 52.99% 
Wordpress 57.19 47.57 16.82% 
Decider 27.38 5.69 79.21% 
Average 121.53 39.91 67.16% 


across objects of study ranging 
from 16.82% to 82.31%. 


These results demonstrate the potential effectiveness of our optimization technique, 
because in every case, the analysis time required by PLATINUM to find solution in- 
stances of mutated specifications was less than that required by the state of the art 


analysis techniques. 


4.6 Results for RQ2 
(Successive Changes) 


To assess the effectiveness of PLAT- 
INUM in accelerating analysis in suc- 
cessive runs on evolving specifica- 
tions we use two performance met- 
rics: time ratio (TR) and variable 
ratio (VR). We define the time ra- 
tio as R, where tp is the analysis 
time taken by PLATINUM and tą is 
the analysis time taken by the Al- 
loy Analyzer. Intuitively, lower val- 
ues of TR imply greater speedup. A 
TR of 0.5, for example, indicates that 
PLATINUM is two times faster than 
the Alloy analysis of the same spec- 
ification, whereas a TR of 0.1 in- 
dicates that PLATINUM is 10 times 
faster. Similarly, we define the vari- 
able ratio as $% n , where varp is the 
number of variables in a SAT for- 
mula produced by PLATINUM and 
vara is the number of variables in 
a SAT formula produced by the Al- 
loy Analyzer for the same specifica- 
tion. Again, lower values of VR im- 
ply that there are fewer variables in a 
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Fig. 3: Speedup and reuse during successive mu- 
tation analyses across subject domains. The left 
column represents scatter plots of time ratios 
(Analysis time taken by PLATINUM / Analysis 
time taken by Alloy), and the right column rep- 
resents scatter plots of reuse ratios (#Variables 
in the SAT formula transformed by PLATINUM 
/ #Variables in the SAT formula transformed by 
the Alloy Analyzer) across systems. 
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formula generated by PLATINUM than in a formula generated by the Alloy Analyzer. 
We started PLATINUM with an empty cache, and then analyzed each mutation in turn, 
continually populating the cache. 

Fig. 3 presents a pair of diagrams for each of the four object specifications, demon- 
strating speedup and reuse during successive mutation analyses. The left column repre- 
sents scatter plots of time ratios across subject domains, and the right column represents 
scatter plots of variable ratios. All four sets of experiments exhibit similar behavior: in 
every case, and for every revision, the analysis time taken by PLATINUM is less than 
that of using the Alloy Analyzer (values of TR are always less than 1), and the num- 
ber of variables in formulas generated by PLATINUM is significantly less than those 
generated by the Alloy Analyzer. The speedup, however, varies for different mutations. 
Variation across mutations is expected, given that the size and complexity of the muta- 
tions produced in successive runs differ greatly. In a few cases, the values for TR jump. 
Investigation of the data shows that this occurred because the mutations present in those 
cases contained several new slices not yet observed, which in turn reduced the amount 
of reuse. Despite these few cases, the empirical results suggest that significant speedup 
was possible in all cases. 


4.7 Results for RQ3 (Scope Changes) 


Alloy’s analysis is exhaustive, yet bounded, up to a user-specified scope on the size 
of the domains. In cases in which the analyzer fails to produce a solution that satisfies 
specification constraints within a given scope, a solution may be found in a larger scope. 
In practice, Alloy users often conduct consecutive analysis runs of specifications, ap- 
plying small increases in the analysis scope, in the hopes of gaining further confidence 
in their results. It has been shown that 17.6% of consecutive Alloy analyses differ only 
in terms of their analysis scopes [31]. 

To examine how our optimization approach responds to increases in analysis scope, 
for each specification, we gradually increased the scope of the analysis. We set the 
initial scope for the analysis of each specification to the scope that had already been 
specified by its original modeler, reasoning that whoever had developed and analyzed 
the specification is most likely the best judge of the scope that is needed. The initial 
scopes for the CSOS, Decider, Ecommerce, and Wordpress specifications were 51, 27, 
50, and 32, respectively. We started PLATINUM with an empty cache for the analysis of 
each specification, and gradually populated it as the analysis scope increased. 

Table 4 shows the time ratios (TRs) 
measured as the analysis scope increased 
for each of the objects of study. Re- 
call that lower values for TR indicate 
that greater acceleration was achieved Scope increase +1 +2 +3 +4 +5 
by our optimization technique. The data CSOS 0.765 0.122 0.098 0.118 0.035 
shows that overall as scope increased, TR Decider 0.393 0.036 0.038 0.023 0.034 
tended to decrease. For example, for the Ecommerce 0.727 0.234 0.413 0.049 0.031 


Ecommerce system, the lowest value for Wordpress 0.486 0.107 0.079 0.053 0.079 
TR occurred when the scope increased to 


five, resulting in a 1 / 0.031 = 32 fold analysis speed acceleration. 


Table 4: Analysis Time Improvements Over In- 
creasing Sizes of Analysis Scope 
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4.8 Results for RQ4 (Overhead) 


We next evaluate the performance of Table 5: Analysis Times With Respect to Over- 
PLATINUM’s formula restructuring anal- head Incurred Due to Restructuring of Formulas 


ysis. Table 5 shows the time required to Slicing Canon 

restructure relational logic formulas into Time(ms)|Time(ms) eovernead 
their canonical forms. The first column CSOS 7 268 0.36% 
represents the time spent decomposing Decider 3 41 0.63% 
formulas into independent slices, and the | Ecommerce 10 116 1.01% 
second column represents the time spent Wordpress 5 138 2.44% 
canonicalizing them into normalized for- Average 6.25} 140.75 1.11% 
mats. 


As the data shows, the analysis time overhead incurred by these two steps is 1.11% 
on average, and no greater than 2.44% in any case. This is negligible, particularly when 
compared to the analysis time overhead incurred by the Alloy Analyzer (cf. Table 3). 
While the restructuring steps introduce little overhead, they substantially enable reuse 
of slice solutions, which in turn greatly reduces analysis costs. 
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Fig. 4: Analysis times for the Alloy Analyzer and PLATINUM across specifications from real- 
world Android apps. 


4.9 Results for RQ5 (Real-World Specifications) 


Finally, to assess the improvements one could expect in practice using PLATINUM, we 
used Alloy specifications that were automatically extracted from real-world software 
systems and evolved versions thereof, as described in Section 4.1. Fig. 4 shows the re- 
sults of a comparison of the analysis time required by each of the two techniques as 
boxplots across the six bundle specifications. As the results show, PLATINUM exhibited 
a 66.4% improvement, on average, over the Alloy Analyzer; the average improvement 
across app bundles ranged from 44.2% to 78.4%, indicating relative stability across 
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bundles. These results further confirm those obtained through our mutation-based ex- 
periments, corroborating the effectiveness of PLATINUM in improving the analysis time 
required by the Alloy Analyzer to find solution instances of revised specifications. 


5 Related Work 


The literature contains a large body of research related to ours. Here, we provide an 
overview of the most notable and closely related work and examine it in the light of our 
research. 

The widespread use of Alloy has prompted a number of extensions to the language 
and its underlying automated analyzer [10, 23, 24, 25, 27, 32, 37, 39, 40, 41, 45, 53, 
58, 59]. Among these, Titanium [10] presents an exploration space reduction strategy 
that narrows the space of values to be explored by an underlying constraint solver. This 
approach, however, requires an entire solution set to be produced for the original spec- 
ification, to determine tighter bounds for certain relations in the revised specification. 
Our work differs primarily in its emphasis on reducing constraints into a more concise 
form at the level of relational logic abstractions, which in turn allows for substantial 
reuse of analysis efforts in subsequent analyses. Research efforts on bound adjustment 
and solution reuse are complementary in that, in spite of the adjustments made to the 
analysis bounds, the solver still needs to solve for the shared constraints. 

Uzuncaova and Khurshid [60] partition a specification into base and derived slices, 
in which a solution to the base slice can be extended to produce a solution for the 
entire specification. PLATINUM is fundamentally different from this work in that the 
problem addressed by Uzuncaova and Khurshid assumes a fixed specification and does 
not consider specification evolution. Further, their approach does not eliminate the need 
to solve shared, canonicalized constraints across analyses. 

Rosner et al. [51] present a technique, Ranger, that leverages a linear ordering of 
the solution space to support parallel analysis of first-order logic specifications. While 
the linear ordering enables partitioning of the solution space into ranges, there is no 
clear way in which it can be extended with incremental analysis capabilities, which are 
crucial for effective analysis of evolving specifications. 

Several techniques attempt to explore specification instances derived from Alloy’s 
relational logic constraints [18, 33, 44, 45, 56]. Macedo et al. [33] examine scenario ex- 
plorations in the context of relational logic. Aluminum [45] extends the Alloy Analyzer 
to generate minimal specification instances. Both of these efforts focus primarily on the 
order in which solutions are produced, as opposed to facilitating analysis of evolving 
specifications, which is our goal. Montaghami and Rayside [39] extend the Alloy lan- 
guage to explicitly support partial modeling. Their work, however, does not consider 
evolving specifications. In fact, it is widely recognized that efficient techniques for ana- 
lyzing Alloy specifications are needed [58]. To the best of our knowledge, however, no 
prior research has attempted to reduce the need to call a solver to improve the efficiency 
of the analysis of evolving Alloy specifications. 

The technique most closely related to ours is Green [62]; this technique has been 
the subject of several more recent papers [6, 7, 29, 47, 49, 50, 65], that improve on its 
algorithm. As noted in Section 1, Green and its offshoots also rely on back-end con- 
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straint solving engines. In contrast to all of this prior work, the problem we address 
in this paper involves supporting the evolutionary analysis of relational logic. Among 
other things, this requires the development of both original slicing and canonicalization 
approaches appropriate for models specified in Alloy’s relational logic. Moreover, nei- 
ther Green’s slicer nor its canonicalizer take into account the disjunction operator [4]. 
While the lack of support for the disjunction operator might be allowable in the con- 
text of symbolic execution, that support is essential in the context of first-order logic 
to allow an approach to effectively recognize opportunities for constraint reduction and 
reuse. Further, while most of the prior techniques use a classic lexicographic order- 
ing of the variables before transforming each slice into a canonical format, PLATINUM 
leverages a reverse shortlex order, in which the variables are first sorted by their weight 
and then sorted lexicographically. This choice improves the identification of syntactic 
equivalence between different slices. To the best of our knowledge, PLATINUM is the 
first technique for evolutionary analysis of relational logic specification that operates 
without requiring an entire solution set for the original specification. 


6 Conclusions 


We have presented PLATINUM, a novel extension to the Alloy Analyzer that substan- 
tially improves the process of analyzing evolving Alloy specifications. Our approach 
proceeds by storing solved constraints incrementally, and reusing them within subse- 
quent analysis of a revised specification. It also omits redundant constraints from spec- 
ifications before translating them into formulas that will be sent to constraint solvers. 
Our evaluation of PLATINUM shows that it is able to support substantial reuse of con- 
straint solutions across analyses of evolving specifications. Our empirical results show 
significant speedup over the Alloy Analyzer in various scenarios. Our evaluation also 
shows that as the scope of analysis increases, PLATINUM achieves even further im- 
provements, and that the overhead associated with the approach is negligible. Finally, 
our evaluation shows that PLATINUM continues to result in savings on specifications 
extracted from real-world software systems. 

Our future work involves extending the optimization ideas presented here to lever- 
age domain-specific knowledge. Specifically, we intend to explore the possibility of 
driving the automated discovery of domain-specific optimizations, wherein each system 
of interest can have bounded verification tailored to its specific characteristics. While 
such optimizations historically have arisen from the insights of a few dozen experts in 
software verification, we envision a bounded speculative analysis to identify how opera- 
tions permissible within a certain domain may impact the exploration space of bounded 
analyses, thereby facilitating efficient analysis of specifications in a given domain. 
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Abstract. System development is not a linear, one-shot process. It pro- 
ceeds through refinements and revisions. To support assurance that the 
system satisfies its requirements, it is desirable that continuous verifica- 
tion can be performed after each refinement or revision step. To achieve 
practical adoption, formal verification must accommodate continuous 
verification efficiently and effectively. Model checking provides develop- 
ers with information useful to improve their models only when a property 
is not satisfied, i.e., when a counterexample is returned. However, it is 
desirable to have some useful information also when a property is instead 
satisfied. To address this problem we propose TOrPEDO, an approach 
that supports verification in two complementary forms: model checking 
and proofs. While model checking is typically used to pinpoint model 
behaviors that violate requirements, proofs can instead explain why re- 
quirements are satisfied. In our work, we introduce a specific notion of 
proof, called Topological Proof. A topological proof produces a slice of 
the original model that justifies the property satisfaction. Because mod- 
els can be incomplete, TOrPEDO supports reasoning on requirements 
satisfaction, violation, and possible satisfaction (in the case where satis- 
faction depends on unknown parts of the model). Evaluation is performed 
by checking how topological proofs support software development on 12 
modeling scenarios and 15 different properties obtained from 3 exam- 
ples from literature. Results show that: (i) topological proofs are ~60% 
smaller than the original models; (ii) after a revision, in ~78% of cases, 
the property can be re-verified by relying on a simple syntactic check. 


Keywords: Topological Proofs - Iterative Design - Model Checking - 
Theorem Proving - Unsatisfiable Core. 


1 Introduction 


One of the goals of software engineering and formal methods is to provide au- 
tomated verification tools that support designers in producing models of an 
envisioned system which follows a set of properties of interest. Designers benefit 
from automated support to understand why their system does not behave as 
expected (e.g., counterexamples), but they might find it also useful to retrieve 
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information when the system already follows the specified requirements. While 
model checkers provide the former, theorem provers sustain the latter. Theorem 
provers usually rely on some form of deductive mechanism that, given a set of 
axioms, iteratively applies a set of rules until a theorem is proved. The proof 
consists of the specific sequence of deductive rules applied to prove the theo- 
rem. In the literature, many approaches have dealt with an integration of model 
checking and theorem proving at various levels (e.g., [48,60,53,36]). These ap- 
proaches are oriented to produce certified model checking procedures rather than 
tools that actually help the design process. Even when the idea is to provide a 
practically useful framework [49,50], the output consists of deductive proofs that 
are usually difficult to understand and hardly connectable with the designer’s 
modeling choices. Moreover, verification techniques only take into account com- 
pletely specified designs. This is a remarkable limitation in practical contexts, 
where the designer may start by providing an initial, high-level version of the 
model, which is iteratively narrowed down as design progresses and uncertain- 
ties are removed [13,42,8,19,65,43]. A recent work [4,5] considered cases in which 
a partial knowledge of the system model is available. However, the presented 
approach was mainly theoretical and lacked a practical implementation. 

We formulate our problem on models that contain uncertain parts. We choose 
Partial Kripke Structures (PKSs) as a formalism to represent general models for 
the following reasons: (i) PKSs have been used in requirement elicitation to 
reason about system behavior from different points of view [19,8], and are a 
common theoretical reference language used in the formal method community 
for the specification of uncertain models (e.g, [26,9,27,10]); (ii) other model- 
ing formalisms commonly used in software development [23,64], such as Modal 
Transition Systems [37] (MTSs), can be converted into PKSs through a simple 
transformation [26] making our solution easily applicable to those models. 

Kripke Structures (KSs) are particular instances of PKSs that represent com- 
plete models. Requirements on the model are expressed in Linear-time Temporal 
Logic (LTL). As such, the approach presented in the following is generic: it can 
be applied on models that contain uncertain parts (PKSs) or not (KSs), and can 
be easily adapted to support MTSs. 

Verification techniques that consider PKSs return three alternative values: 
true if the property holds in the partial model, false if it does not hold, and 
maybe if the property possibly holds, i.e., its satisfaction depends on the parts 
of the model that still need to be refined. As models are revised, i.e., they are 
modified during design iterations, designers need support to understand why 
properties are satisfied, or possibly satisfied. 

A comprehensive and integrated design framework able to support software 
designers in understanding such motivation is still missing. We tackle this prob- 
lem by presenting TOrPEDO (TOpological Proof drivEn Development frame- 
wOrk), a novel automated verification framework, that: 


(i) supports a modeling formalism which allows a partial specification of the 
system design; 
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(ii) allows performing analysis and verification in the context of systems in which 
“incompleteness” represents a conceptual uncertainty; 

(iii) provides guidance in producing model revisions through complementary out- 
puts: counterexamples and proofs; 

(iv) when the system is completely specified, allows understanding which changes 
impact or not the satisfaction of certain properties. 


TOrPEDO is based on the novel notion of topological proof (TP), which 
tries to overcome the complexity of deductive proofs and is designed to make 
proofs understandable on the original system design. A TP is a slice of the 
original model that specifies which part of it impacts the property satisfaction. 
If the slice defined by the TP is not preserved during a revision, there is no 
assurance that the property holds (possibly holds) in the revised model. This 
paper proposes an algorithm to compute topological proofs—which relies on the 
notion of unsatisfiable cores (UCs) [56]—and proves its correctness on PKSs. It 
also proposes an algorithm that checks whether a TP is preserved in a model 
revision. This simple syntactic check avoids (in many cases) the execution of the 
model checking procedure. While architectural decomposition and composition 
of components can be considered during the system development [42], in this 
work we present our solution by assuming that the system is modeled as a single 
PKS. However, our framework can be extended to consider the composition of 
components, such as the parallel composition of PKSs or MTSs. This can be done 
by extracting the portions of the TP that refer to the different components. 

TOrPEDO has been implemented on top of NuSMV [14] and PLTL-MUP [58]. 
The implementation has been exploited to evaluate TOrPEDO by considering 
a set of examples coming from literature including both completely specified 
and partially specified models. We considered 3 different example models and 4 
variations for each model that was presented in the literature [12,20]. We con- 
sidered 15 properties, i.e., 5 for each example, leading to a total of 60 (3x4x5) 
scenarios that require the evaluation of a property on a model. We evaluated 
how our framework supports model design by comparing the size of the gener- 
ated topological proofs against the size of the original models. Results show that 
topological proofs are ~60% smaller than the original models. Moreover, after 
a revision, in ~78% of cases, our syntactic check avoids the re-execution of the 
model checker. 

Organization. Section 2 describes TOrPEDO. Section 3 discusses the back- 
ground. Sections 4 and 5 present the theoretical results and the algorithms that 
support TOrPEDO. Section 6 evaluates the achieved results. Section 7 discusses 
related work. Section 8 concludes. 


2 TOrPEDO 


TOrPEDO is a proof based development framework which allows verifying PKSs 
and evaluating their revisions. To illustrate TOrPEDO, we use a simple model 
describing the states of a vacuum-cleaner robot that has to satisfy the require- 
ments in Fig. 2, specified through LTL formulae and English natural language. 
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LTL formulae 


Qı = G(suck — reached) 
OFF IDLE $2 = G((>move) W on) 
$3 = G(((>move) A on) + suck) 
ga = ((m8suck) W(move A (—suck))) 


Textual requirements 


move = L 
suck = L 
on=T 


move = L 
suck = L 
on= 1 
reached = L 


g1: the robot is drawing dust (suck) only if 
it has reached the cleaning site. 

Q2: the robot must be turned on before it can 
move. 

3: if the robot is on and stationary (=move), 
it must be drawing dust (suck). 

ga: the robot must move before it is allowed 
to draw dust (suck). 


reached =? 


CLEANING MOVING 


move = T 
suck =? 


on=T 
reached =? 


peau 1 Fig. 2: Natural language and LTL for- 


mulation of the requirements of the 
vacuum-cleaner robot. 
G and W are the “globally” and “weak 
until” LTL operators. 


Fig. 1: PKS of a vacuum-cleaner robot. 


The TOrPEDO framework is illustrated in Fig. 3 and carries out verification in 
four phases: INITIAL DESIGN, ANALYSIS, REVISION, and RE-CHECK. 

INITIAL DESIGN (ÉB). The model of the system is expressed using a PKS M 
(E), which can be generated from other languages, along with the property of 
interest ¢, in LTL (Ø). 

Running example. The PKS presented in Fig. 1 is defined over two atomic 
propositions representing actions that a robot can perform: move, i.e., the agent 
travels to the cleaning site; suck, i.e., the agent is drawing the dust, and two 
atomic propositions representing conditions that can trigger actions: on, true 
when the robot is turned on; reached, true when the robot has reached the 
cleaning site. The state OFF represents the robot being shut down, IDLE the 
robot being tuned on w.r.t. a cleaning call, MOVING the robot reaching the 
cleaning site, and CLEANING the robot performing its duty. Each state is la- 
beled with the actions move and suck and the conditions on and reached. Given 
an action or condition a and a state s, we use the notation: œ = T to indicate 
that œ occurs when the robot is in state s; œ = L to indicate that œ does not 
occur when the robot is in state s; a =? to indicate that there is uncertainty on 
whether a occurs when the robot is in state s. 

Anatysis (Ø). TOrPEDO provides automated analysis support, which in- 
cludes the following elements: 


(i) Information about what is wrong in the current design. This information in- 
cludes a definitive-counterexample, which indicates a behavior that depends 
on already performed design choices and violates the properties of interest. 
The definitive-counterexample (EJ 1-CE) can be used to produce a revised 
version M’ of M that satisfies or possibly satisfies the property of interest. 
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| TOrPEDO 


11 
>| Correct 
= design. 


Í Initial | 
: design : 


Fig. 3: TOrPEDO structure. Continuous arrows represent inputs and outputs to 
phases. Numbers are used to reference the image in the text. 


(ii) Information about what is correct in the current design. This information 
includes definitive-topological proofs (Æ) T-TP) that indicate a portion of 
the design that ensures satisfaction of the property. 

(iii) Information about what could be wrong/correct in the current design, de- 
pending on how uncertainty is removed. This information includes: a possible- 
counterexample (§§ ?-CE), indicating a behavior (which depends on uncer- 
tain actions and conditions) that violates the properties of interest, and a 
possible-topological proof (@ ?-TP), indicating a portion of the design that 
ensures the possible satisfaction of the property of interest. 


In the following we will use the notation x-topological proofs or x-TP to indi- 
cate arbitrarily definitive-topological or possible-topological proofs. The results 
returned by TORPEDO for the different properties in our motivating example 
are presented in Table 1. Property @9 is satisfied, ¢3 is not. In those cases TOR- 
PEDO returns respectively a definitive-proof and a definitive-counterexample. 
Since ġı and ¢4 are possibly satisfied, in both cases a possible-counterexample 
and a possible-topological proof are returned. 

Running example. For ġı the possible-counterexample shows a run that 
may violate the property of interest. The possible-topological proof in Table 1 
shows that if OFF remains the only initial state (TPI), reached still holds in 
CLEANING, and suck does not hold in OFF and IDLE, while unknown in 
MOVING (TPP), property ġı remains possibly satisfied. In addition, all tran- 
sitions must be preserved (TPT).° Note that the proof highlights portions of the 
model that influence the property satisfaction. For example, by inspecting the 
proof, the designer understands that she can change the value of the proposition 
reached in all the states of the PKS, with the exception of the state CLEANING, 
without making the property violated. 


3 The precise formal descriptions of a-topological proofs, TPI, TPT and TPT are 
presented in Section 4. 
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Table 1: Results provided by TOrPEDO for properties 1, ¢2, @3 and ¢@4. T, L 
and ? indicate that the property is satisfied, violated and possibly satisfied. 


?-CE OFF, IDLE, (MOVING)”. 
TPP: (CLEANING, reached, T) (OFF, suck, L}, (IDLE, suck, L}, (MOVING, suck, ? ) 


TPT: (OFF, {OFF, IDLE}), (IDLE, { OFF, IDLE, MOVING}), 
(MOVING, { MOVING, CLEANING}), (CLEANING, { CLEANING, IDLE}) 


TPI: ({ OFF}) 
TPP: (MOVING, on, T), (CLEANING, on, T), (OFF, move, L ), (IDLE, move, L) 
TPT: (OFF, {OFF, IDLE}), (IDLE, { OFF, IDLE, MOVING}), 

(MOVING, {MOVING, CLEANING}), (CLEANING, { CLEANING, IDLE}) 
TPI: ({ OFF}) 


$3 -CE OFF, IDLE” 
?-CE OFF, (IDLE, MOVING, CLEANING, IDLE, OFF)” 


TPP: (OFF, suck, L), (IDLE, suck, L), (MOVING, suck, ? ), (MOVING, move, T} 
2-TP TPT: (OFF, {OFF, IDLE}), (IDLE, {OFF, IDLE, MOVING}) 
TPI: ({ OFF}) 


gi? 


?-TP 


Q2 -TP 


ga ? 


Revision (E). Revisions (@) can be obtained by changing some parts of 
the model: adding/removing states and transitions or by changing propositions 
labelling inside states, and are defined by considering the TP (@). 

Running example. The designer may want to propose a revision that still 
does not violate properties $1, ¢2, and 4. Thus, she changes the values of 
some atomic propositions: move becomes T in state CLEANING and reached 
becomes L in state IDLE. Since ¢1, ¢2, and $4 were previously not violated, 
TOrPEDO performs the RE-CHECK phase for each property. 

Re-CHECK (B). The automated verification tool provided by TOrPEDO 
checks whether all the changes in the current model revision are compliant with 
the z-TPs (M), i.e., changes applied to the revised model do not include parts 
that had to be preserved according to the x-topological proof. If a property of in- 
terest is (possibly) satisfied in a previous model, and the revision of the model is 
compliant with the property z-TP, the designer has the guarantee that the prop- 
erty is (possibly) satisfied in the revision. Thus, she can perform another model 
revision round (§@) or approve the current design (EB). Otherwise, TOrPEDO 
re-executes the ANALYSIS (M). 

Running example. In the vacuum-cleaner case, the revision passes the RE- 
CHECK and the designer proceeds to a new revision phase. 


3 Background 


We present background notions by relying on standard notations for the selected 
formalisms (see for example [26,9,10,30]). 

Partial Kripke Structures (Q) are state machines that can be adopted 
when the value of some propositions is uncertain on selected states. 


Definition 1 ([9],[35]). A Partial Kripke Structure (PKS) M is a tuple (S, R, 
So, AP, L), where: S is a set of states; R C S x S is a left-total transition 
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relation on S; So is a set of initial states; AP is a set of atomic propositions; 
L:Sx AP —+ {7T,?,L} is a function that, for each state in S, associates a truth 
value to every atomic proposition in AP. A Kripke Structure (KS) M is a PKS 
(S, R, So, AP, L}, where L: Sx AP > {T, L}. 


A PKS represents a system as a set of states and transitions between these states. 
Uncertainty on the AP is represented through the value ?. The model in Fig. 1 
is a PKS where propositions in AP are used to model actions and conditions. 
LTL properties (Ø). For KSs we consider the classical LTL semantics [M | 4] 
over infinite words that associates to a model M and a formula ¢ a truth value 
in the set {L, T}. The interested reader may refer, for example, to [3]. Let M 
be a KS and ¢ be an LTL property. We assume that the function CHECK, such 
that (res,c) = CHECK(M, ¢), returns a tuple (res,c), where res is the model 
checking result in {T,_L} and c is the counterexample if res = L, else an empty 
set. 

The three-valued LTL semantics [9] [M — @] associates to a model M and 
a formula ¢ a truth value in the set {1,?, T} and is defined based on the in- 
formation ordering T >? > L. The three-valued LTL semantics is defined by 
considering paths of the model M. A path ~m is an infinite sequence of states 
So, 81,--- such that, for all i > 0, (s;,8;41) € R. We use the symbol zê to in- 
dicate the infinite sub-sequence of m that starts at position i, and Path(s) to 
indicate all the paths that start in the state s. 


Definition 2 ([9]). Let M = (S, R, So, AP, L) be a PKS, Tt = 80, 51,... be a 
path, and ¢ be an LTL formula. Then, the three-valued semantics [(M,7) H 4] 
is defined inductively as follows: 


M,7) = p] = L(so,p) 

M, 7) = ~g] = comp(|(M,7) = ¢)) 

M,T) | ¢1A¢2] = min([(M,7) E ¢1], (M, r) F ¢2]) 

M, 7) = X q] = [(M,7*) = ġ 

M,7) = Up] = max(min({[(M, z’) = gilli < i} U {[(, 7f) = ¢2]})) 
Let M = (S, R, So, AP, L) be a PKS, and ¢ be an LTL formula. Then [M _} 
¢] = min({[(M, T) H ¢] | r E€ Path(s) and s € So}). 


The conjunction (resp. disjunction) is defined as the minimum (resp. max- 
imum) of its arguments, following the order L <? < T. These functions are 
extended to sets with min(@)=T and max()=1L. The comp operator maps T 
to L, L to T, and ? to ?. The semantics of the G (“globally”) and W (“weak 
until”) operators is defined as usual [28]. 


Model Checking. Checking KSs with respect to LTL properties can be done 
by using classical model checking procedures. For example, the model checking 
problem of property ¢ on a KS M can be reduced to the satisfiability problem 
of the LTL formula m Ang, where Pm represents the behaviors of model M. 
If m \7¢ is satisfiable, then [M — ¢] = L, otherwise [M — ¢] = T. 
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Checking a PKS M with respect to an LTL property ¢ considering the three- 
valued semantics is done by performing twice the classical model checking pro- 
cedure for KSs [10], one considering an optimistic approximation M,,; and one 
considering a pessimistic approximation Mpes. These two procedures consider the 
LTL formula ¢/ = F(¢), where F transforms ¢ with the following steps: (i) negate 
@; (ii) convert =@ in negation normal form; (iii) replace every subformula ~a, 
where a is an atomic proposition, with a new proposition @. 

To create the optimistic and pessimistic approximations M,,; and Mpes, 
the PKS M = (S, R, So, AP, L) is first converted into its complement-closed 
version Me = (S, R, So, AP, Le) where the set of atomic propositions AP. = 
APUAP is such that AP = {@| a € AP}. Atomic propositions in AP are called 
complement-closed propositions. Function Le is such that, for all s € S anda € 
AP, L.(s,a) = L(s,q) and, for all s € S and @€ AP, L,(s,p) = comp(L(s,p)). 
The complement-closed PKS of the vacuum-cleaner agent in Fig. 1 presents eight 
propositional assignments in the state IDLE: move = L, move = T, suck = L, 
suck = T, on = T, on = L, reached =?, and reached =?. 

The two model checking runs for a PKS M = (S, R, So, AP, L) are based 
respectively on an optimistic (Mop: = (S, R, So, APs, Lopt)) and a pessimistic 
(Mpes = (S, R, So, AP, Lpes)) approximation of M’s related complement-closed 
Me = (S, R, So, AP., Le). Function Lpes (resp. Lopt) is such that, for all s € S, 
a € AP., and L,(s,a) € {T, L}, then Lyes(s,a) = Le(s,a) (resp. Lope(s, a) = 
Le(s,a)), and, for all s € S, a € AP., and L,(s,a) =?, then Lyes(s,a) = L 
(resp. Lope (s, a) = T). 

Let A be a KS and ¢ be an LTL formula, A }* ¢ is true if no path that 
satisfies the formula F(¢) is present in A. 


Theorem 1 ([9]). Let ¢ be an LTL formula, let M = (S, R, So, AP, L) be a 
PKS, and let Mpes and Mop: be the pessimistic and optimistic approximations 
of M’s relative complement-closed Me. Then 


sg {7 Peo 
[M B o] = 4 if Mopt yA* O (1) 
? otherwise 
We call CHECK“ the function that computes the result of operator =*. It 


takes as input either Mpes or Mopt and the property F(¢), and returns a tuple 
(res,c), where res is the model checking result in {T, L}, and c can be an empty 
set (when M satisfies ¢), a definitive-counterexample (EJ, when M violates ¢), 
or a possible-counterexample (§3, when M possibly-satisfies ¢). 


4 Revising models 


We define how models can be revised and the notion of topological proof, that is 
used to describe why a property ¢ is (possibly) satisfied in a PKS M. 


Initial design and revisions (§§,€)). In the initial design a preliminary PKS 
is manually defined or automatically obtained from other modeling formalisms. 
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During a revision, a designer can add and remove states and transitions and/or 
change the labeling of the atomic propositions in the states of the PKS. Let 
M = (S, R, So, AP, L) and M’ = (S', R', Sh, AP’, L’) be two PKSs. Then 
M' is a revision of M if and only if AP C AP’. Informally, the only constraint 
the designer has to respect during a revision is not to remove propositions from 
the set of atomic propositions. This condition is necessary to ensure that any 
property that can be evaluated on M can also be evaluated on M’, i.e., every 
atomic proposition has a value in each of the states of the automaton. The 
deactivation of a proposition can instead be simulated by associating its value 
to L in all the states of M”. 


Topological proofs (@,@). The pursued proof is made of a set of clauses 
specifying certain topological properties of M, which ensure that the property 
is (possibly) satisfied. 


Definition 3. Let M = (S, R, So, AP, L) be a PKS. A Topological Proof clause 
(TP-clause) y for M is either: 


— a Topological Proof Propositional clause (T’PP-clause), i.e., a triad (s, a, v) 
where s € S, a E€ AP, andv E€ {T,?, L}; 

— a Topological Proof Transitions-from-state clause (TPT-clause), i.e., a pair 
(s, T}, such that s € S,T C S; 

— a Topological Proof Initial-states clause (TPI-clause), i.e., an element (So). 


These clauses indicate topological properties of a PKS M. Informally, TPP- 
clauses constrain how states are labeled (L), TPT-clauses constrain how states 
are connected (R), and TPI-clauses constrain from which states the runs on the 
model begin (So). For example, in Table 1, for property ¢1, (CLEANING, reached, 
T) isa TPP-clause that constrains the atomic proposition reached to be labeled 
as true (T) in the state CLEANING; (OFF ,{OFF,IDLE}) is a TPT-clause 
that constrains the transition from OFF to OFF and from OFF to IDLE to 
not be removed; and ({ OFF }) is a TPI-clause that constrains the state OFF to 
remain the initial state of the system. 

A state s’ is constrained: by a TPP-clause (s,a,v) if s = s’, by a TPT-clause 
(s,T) if s = s or s € T, and by a TPlI-clause (So) if s’ € Sp. 


Definition 4. Let M = (S, R, So, AP, L) be a PKS and let N) be a set of TP- 
clauses for M. Then a PKS 92-related to M is a PKS M’ = (S’, R', S$, AP’, L’), 
such that the following conditions hold: 


— APC AP'; 

— for every TPP-clause (s,a,v) € Q, s E€ S', v=L'(s,a); 

— for every TPT-clause (s, T) E€ RQ, s€ S,T C F, T = {5 €S'\(s,8')€ RY; 
— for every TPI-clause (So) € 2, So = Sf. 


Intuitively, a PKS §2-related to M is a PKS obtained from M by changing 
any topological aspect that does not impact on the set of TP-clauses 2. Any 
transition whose source state is not the source state of a transition included in 
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the TPT-clauses can be added or removed from the PKS and any value of a 
proposition that is not constrained by a TPP-clause can be changed. States can 
be always added and they can be removed if they are not constrained by any 
TPT-, TPP-, or TPl-clause. Initial states cannot be changed if 2 contains a 
TPI-clause. 


Definition 5. Let M = (S, R, So, AP, L) be a PKS, let o be an LTL property, 
let Q be a set of TP-clauses, and let x be a truth value in {T,?}. A set of TP- 
clauses Q is an x-topological proof (or x-TP) for ¢ in M if: (i) |M E ¢] = z; 
and (ti) every PKS M’ QQ-related to M is such that |M" = ¢| > a. 


Intuitively, an x-topological proof is a set of TP-clauses 9 such that ev- 
ery PKS M’ that satisfies the conditions specified in Definition 4 is such that 
[M’ H ¢] > x. We call T-TP a definitive-topological proof and ?-TP a possible- 
topological proof. In Definition 5, the operator > assumes that values T,?, L are 
ordered considering the classical information ordering T >? > L among the 
truth values [9]. 

Regarding the PKS in Fig. 1, Table 1 shows two ?-TPs for properties ¢; and 
4, and one T-TP for property ¢2. 


Definition 6. Let M and M’ be two PKSs, let ọ be an LTL property, and let 
Q be an x-TP. Then M' is an 9,-revision of M if M’ is Q-related to M. 


Intuitively, since the 2,-revision M’ of M is such that M’ is (2-related to M, 
it is obtained by changing the model M while preserving the statements that 
are specified in the x-TP. A revision M’ of M is compliant with the x-TP for a 
property ¢ in M if it is an Q,-revision of M. 


Theorem 2. Let M be a PKS, let ¢ be an LTL property such that |M = ¢| = T, 
and let 2 be a T-TP. Then every Q7-revision M’ is such that [M" = |] = T. 
Let M be a PKS, let ọ be an LTL property such that |M = ¢] =?, and let 2 be 
an ?-TP. Then every (27-revision M’ is such that |M" = ¢] € {T,?}. 


Proof Sketch. We prove the first statement of the Theorem; the proof of the 
second statement is obtained by following the same steps. 

If 2 isa T-TP, it isa T-TP for ¢ in M’, since M’ is an 27-revision of M (by 
Definition 6). Furthermore, since 2 is a T-TP for ¢ in M’, then [M’ H ¢] > T 
(by Definition 5). 


5 TOrPEDO automated support 


This section describes the algorithms that support the ANALYSIS and RE-CHECK 
phases of TOrPEDO. 

ANALYSIS (§)). To analyze a PKS M = (S, R, So, AP, L) (9), TOrPEDO 
uses the three-valued model checking framework based on Theorem 1. The model 
checking result is provided as output by the ANALYSIS phase of TOrPEDO, whose 
behavior is described in Algorithm 1. 
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1: function ANALYZE(M, ¢) 1: function CTP_KS(M, A, ~) 

2: (res,c) = CHECK*(Mopz, @) 2: n(CaU {w}) = Sys2LTL(A, w) 

3: if res == l then return (1, {c})3: (C4, U {w}) = GETUC(n(Cy U {v})) 
4: else 4: TP = GETTP(M,n(Ch U {y})) 

5: (res’,c’) = CHECK*(Mpes, @) 5: return TP 

6: if res’ == T then return 6: end function 

T (T,{CTP-KS(M, Mpes,F(¢))}) Algorithm 2: Compute Topological Proofs. 
8: else 

9: return 
10: (?, {c', CTP-KS(M, Mopt, F(¢))}) 
11: end if 
12: end if 


13: end function 
Algorithm 1: The ANALYSIS algorithm. 


The algorithm returns a tuple (x, y}, where x is the verification result and y 
is a set containing the counterexample, the topological proof or both of them. 
The algorithm first checks whether the optimistic approximation Mopt of the 
PKS M satisfies property ¢ (@, Line 2). If this is not the case, the property is 
violated by the PKS and the definitive-counterexample c (§}, L-CE) is returned 
(Line 3). Then, it checks whether the pessimistic approximation Mpes of the PKS 
M satisfies property ¢ (Line 5). If this is the case, the property is satisfied and 
the value T is returned along with the definitive-topological proof (Ø, T-TP) 
computed by the CTP_KS procedure applied on the pessimistic approximation 
Mpes and the property F(@) (Line 7). 


If this is not the case, the property is possibly satisfied and the value ? is 
returned along with the possible-counterexample c’ (@, ?-CE) and the possible- 
topological proof (@, ?-TP) computed by the CTP_KS procedure applied to 
Mopt and F(@) (Line 10). 


The procedure CTPp_KS (Compute Topological Proofs) to compute z-TPs is 
described in Algorithm 2. It takes as input a PKS M, its optimistic/pessimistic 
approximation, i.e., denoted generically as the KS A, and an LTL formula y— 
satisfied in A— corresponding to the transformed property F(#) (see Section 3). 
The three steps of the algorithm are described in the following. 


Sys2LTL. Encoding of the KS A and the LTL formula w into an LTL formula 
n(Ca U{w}). The KS A = (S, R, So, AP., LA) (where Ly is the optimistic or 
pessimistic function, Lopt or Lpes, as defined in Section 3) and the LTL formula 
w are used to generate an LTL formula 


nCau{vp= A e 


cE(CAU{Y}) 
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Table 2: Rules to transform the KS in LTL formulae. 
c= V p(s) 


seSo 
The KS is initially in one of its initial states. 


CR={G(-p(s) Vv X( V_ p(s‘) | s € S} 
(s,s‘)ER 
If the KS is in state s in the current instant, in the next instant it is in one of the successors s’ of s. 
CLr a = {G(>p(s) Va) | s €S,a € AP., La(s,a) = T} 
If the KS is in state s s.t. L4(s,a@) = T, the atomic proposition a is true. 
CLA = {G(-p(s) V ~a) | s E€ S,a E€ AP., La(s,a) = L}. 
If the KS is in state s s.t. La(s,a) = L, the atomic proposition a is false. 
Crre = {G(>p(s) V =p(s')) | 5,8 € S and s # s'} 


The KS is in at most one state at any time. 


where C4 are sets of LTL clauses obtained from the KS A.* The set of clauses 
that encodes the KS is C4 = Cxg U Crea, where Cxg = {ci} UCRUCLI7 AU 
CL, and ¢, CR, CLT 4 and CL, are defined as specified in Table 2. Note 
that the clauses in C4 are defined on the set of atomic propositions APs = AP.U 
{p(s)|s € S}, i.e., APs includes an additional atomic proposition p(s) for each 
state s, which is true when the KS is in state s. The size of the encoding depends 
on the cardinality of C4 i.e., in the worst case, 1 + |S| + |S| x |AP.| + |S| x |S]. 


GETUC. Computation of the Unsatisfiable Core (UC) n(Ci,U{wv}) of n(CaU 
{w}). Since the property w is satisfied on A, 7(C_4U{w}) is unsatisfiable and the 
computation of its UC core is performed by using the PLTLMUP approach [58]. 


Let C = {91, P2,- .-, Pn} be a set of LTL formulae, such that 7(C) = A ¢ is 
pec 
unsatisfiable, then the function 7(C’) = GETUC(n(C)) returns an unsatisfiable 


core 7(C’) = A pof A vy. In our case, since the property holds on the KS 
pec’ pec 

A, GETUC(n(Cy4 U {y})) returns a subset of clauses 7(C’, U{w}), where C4, = 

CK U CREG such that Ons C Cg and CREG C CREG. 


Lemma 1. Let A be a KS and let w be an LTL property. Let also n(Ca U {4}) 
be the LTL formula computed in the step SYS2LTL of the algorithm. Then, any 
unsatisfiable core n(C', U {Y}) of n(Ca U {¢Y}) is such that Cy C Cy. 


Proof Sketch. As the property ¢ is satisfied by M, the LTL formula 7(C4U{w}), 
where Y% = F(¢) must be unsatisfiable as discussed in the Section 3. Indeed, F(¢) 
simply perform some proposition renaming on the negation of the formula w. 


As C4 encodes a KS, A cis satisfied. As such, the unsatisfiability is caused 
cECA 
by the contradiction of some of the clauses in C4 and the property w, and as a 


consequence Y must be a part of the UC. 
GETTP. Analysis of C'4 and extraction of the topological proof. The set C'4, 
where C!, = Cg U CRgg, contains clauses regarding the KS (Ci, and Cpa) 


4 Note that this formula is equivalent to m \7¢ used in Section 3 as ġm is generated 
by the clauses in Ca and ~no from w. 


Integrating Model Checking and Topological Proofs 65 


Table 3: Rules to extract the TP-clauses from the UC LTL formula. 


LTL clause TP clause Type |LTL clause TP clause Type 

Gi = V p(s) (So) TPI G(~p(s)V =a) (s,a, comp(L(s,a))) TPP 
s€So 

G(>p(s)V f h 

ar Met z Teles) en fe eerie) inerrancy TEP 

G(=p(s) V a) (s, a, L(s, a)) TPP |G(=p(s) V 7a) (s,a, L(s, a)) TPP 


and the property of interest (y) that made the formula 7(C’,U{w}) unsatisfiable. 
Since we are interested in clauses related to the KS that caused unsatisfiability, 
we extract the topological proof 2, whose topological proof clauses are obtained 
from the clauses in Cg as specified in Table 3. Since the set of atomic proposi- 
tions of A is AP, = APU AP, in the table we use a for propositions in AP and 
@ for propositions in AP. 

The elements in C'kgg are not considered in the TP computation as, given 
an LTL clause G(-p(s) V ap(s’)), either state s or s’ is constrained by other 
TP-clauses that will be preserved in the model revisions. 


Lemma 2. Let A be a KS and let w be an LTL property. Let also n(C4 U {v}) 
be the LTL formula computed in the step SyS2LTL of the algorithm, where 
Ca = Crea U Crs, and let n(C'4 U {w}) be an unsatisfiable core, where Cl, = 
Crag UCkg- Then, if G(-p(s) V ap(s')) € Crna, either: 


i) there exists an LTL clause in Cie that constrains state s (or state s'); or 
KS 


(ti) (Ch U{Y}), s-t. C4 = Cy \{G(>7(s) V ap(s')) f, is an UC of n(C%4 U {y}). 


Proof Sketch. We indicate G(—p(s) V ap(s’)) as T(s, 5’). Assume per absurdum 
that conditions (i) and (ii) are violated, i.e., no LTL clause in Cg constrains 
state s or s’ and (C4 U {w}) is not an unsatisfiable core of n(C'4 U {w}). 
Since 7(C’% U {w}) is not an unsatisfiable core of (C’, U {w}), n(C% U {4} 
is satisfiable, as C4 C C4. Since (C% U {4}) is satisfiable, n(C’, U {}) s.t. 
C’, = C4 U {7(s, 8’)} must also be satisfiable. Indeed, it does not exist any 
LTL clause that constrains state s (or state s’) and, in order to generate a 
contradiction, the added LTL clause must generate it using the LTL clauses 
obtained from the LTL property w. This is a contradiction. Thus, conditions (i) 
and (ii) must be satisfied. 


The ANALYZE procedure in Algorithm 1 obtains a TP (@3, @) for a PKS by 
first computing the related optimistic or pessimistic approximation (i.e., a KS) 
and then exploiting the computation of the TP for such KS. 


Theorem 3. Let M = (S, R, So, AP, L) be a PKS, let 6 be an LTL property, 
and let x E€ {T,?} be an element such that |[M — ¢] = x. If the procedure 
ANALYZE, applied to the PKS M and the LTL property ¢, returns a TP QQ, this 
is an x-TP for ¢ in M. 


Proof Sketch. Assume that the ANALYZE procedure returns the value T and 
a T-TP. We show that every -related PKS M’ is such that [M’ — o] > x 
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(Definition 5). If ANALYZE returns the value T, it must be that Mpes E* $ by 
Lines 5 and 7 of Algorithm 1. Furthermore, by Line 7, Y = F(¢) and A = Myes. 

Let N = (Sn, Rn, So,n, APN, Ln) be a PKS 2-related to M. Let (Ca U 
{w}) be the LTL formula associated with A and 7 and let (Cg U {w}) be the 
LTL formula associated with B = Npes and 7. Let us consider an UC n(C/,U{y}) 
of n(Ca U {w}), where CO" = C's U CREG: Cks C Cgs and CREG C Crea. We 
show that C’, C Cg, i.e., the UC is also an UC for the LTL formula associated 
with the approximation 6 of the PKS N. 


— O17, C Cp, i.e., (Chg U CRgga) C Cz. By Lemma 2 we can avoid considering 
CkRga- By construction (see Line 2 of Algorithm 2) any clause c € Ckg 
belongs to one rule among C'R, CLpes,T, CLpes,1 Or € = Ci: 

e if c = c; then, by the rules in Table 3, there is a TPI-clause {So} € 2. 
By Definition 4, Sọ = 5}. Thus, c; € Cg since N is -related to M. 

e if c € CR then, by rules in Table 3, there is a TPT-clause (s, T) € Q 
where s € S and T C R. By Definition 4, T = {s’ € S’|(s,s’) € R’}. 
Thus, c € Cg since N is §2-related to M. 

e if ce CLa,t orc E€ CL, 1, by rules in Table 3, there is a TPP-clause 
(s,a, L(s,a)) € 2 where s € S and a € AP. By Definition 4, L'(s,a) = 
L(s,a). Thus, c € Cg since N is §2-related to M. 


Since N is -related to M, it has preserved the elements of 2. Thus n(C’, U{w}) 
is also an UC of Cg. It follows that [N H ¢] = T. 


The proof from the case in which ANALYZE procedure returns the value ? 
and a ?-TP can be derived from the first case. 


Re-CHECK (P). Let M = (S, R, So, AP, L) be a PKS. The RE-CHECK algo- 
rithm verifies whether a revision M’ of M is an §2-revision. Let 2 be an z-TP 
(ETD) for ¢ in M, and let M’ = (S’, R', Sh, AP’, L’) be a revision of M (B). 
The RE-CHECK algorithm returns true if and only if the following holds: 


— APC AP’; 

— for every TPP-clause (s,a,v) E€ Q, s € S', v=L'(s,a); 

— for every TPT-clause (s, T) € RQ, se S, T C S, T = {5 € S'|(s,s) € R'}; 
— for every TPI-clause (So) € 2, So = 56. 


These conditions can be verified by a simple syntactic check on the PKS. 


Lemma 3. Let M = (S, R, So, AP, L) and M’ = (S',R', 56, AP', L') be two 
PKSs and let Q be an x-TP. The RE-CHECK algorithm returns true if and only 
if M' is Q-related to M. 


Proof Sketch. Since M’ is Q-related to M, the conditions of Definition 4 hold. 
Each of these conditions is a condition of the RE-CHECK algorithm. Thus, if 
M' is Q-related to M, the RE-CHECK returns true. Conversely, if RE-CHECK 
returns true, each condition of the algorithm is satisfied and, since each of these 
conditions corresponds to a condition of Definition 4, M’ is Q-related to M. 


This Lemma allows us to prove the following Theorem. 
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Table 4: Properties considered in the evaluation 


¢1: G(7>OFFHOOK) V (-OFFHOOK U CONNECTED) 

$2: 3OFFHOOK W (~OFFHOOK ^ CONNECTED) 

$s: G(CONNECTED -+ ACTIVE) 

ġa: G(OFFHOOK ^ ACTIVE \3CONNECTED > X(ACTIVE)) 
és G(CONNECTED > X(ACTIVE)) 

yı: G(CONNECTED > ACTIVE) 

Yz: G(CONNECTED > X(ACTIVE)) 

Ys: G(CONNECTED) v (CONNECTED U ~OFFHOOK) 

wa: aCONNECTED W (~CONNECTED ^ OFFHOOK) 

Ws: G(CALLEE_SEL + OFFHOOK) 

m: G((OFFHOOK ^ CONNECTED) > ¥(OFFHOOK V ~CONNECTED)) 
n2: G(CONNECTED) V (CONNECTED W =OFFHOOK) 

ns: —~CONNECTED W (~CONNECTED ^ OFFHOOK) 

na: G(CALLEE_FREE V LINE_SEL) 

ns: G(X%(OFFHOOK) \ ~CONNECTED) 


Theorem 4. Let M be a PKS, let @ be a property, let N) be an x-TP for @ in 
M where x € {T,?}, and let M’ be a revision of M. The RE-CHECK algorithm 
returns true if and only if M’ is an Q-revision of M. 


Proof Sketch. By applying Lemma 3, the RE-CHECK algorithm returns true if 
and only if M’ is (2-related to M. By Definition 6, since Q is an x-TP, the RE- 
CHECK algorithm returns true if and only if M’ is an 92-revision of M. 


The ANALYSIS and RE-CHECK algorithms assume that the three-valued LTL 
semantics is considered. While the thorough LTL semantics [10] has been shown 
to provide an evaluation of formulae that better reflects the natural intuition, 
the two semantics coincide in the case of self-minimizing LTL formulae. In this 
case, our results are correct also w.r.t. the thorough semantics. Note that, as 
shown in [24], most practically useful LTL formulae are self-minimizing. Future 
work will consider how to extend the ANALYSIS and RE-CHECK to completely 
support the thorough LTL semantics. 


6 Evaluation 


We implemented TOrPEDO as a Scala stand alone application and made it 
available online [62]. We evaluated how the ANALYSIS helps in creating models 
revisions and how frequently running the RE-CHECK algorithm allows the user 
to avoid the re-execution of the ANALYSIS algorithm from scratch. 

We considered a set of example PKSs proposed in the literature to evaluate 
the .Chek [20] model checker and defined a set of properties (see Table 4) 
inspired by the original properties and based on the LTL property patterns [18].° 
ANALYSIS support (H). We checked how the size of the proofs compares w.r.t. 
the size of the original models. Intuitively, since the proofs represent constraints 


5 The original properties used in the examples were specified in Computation Tree 
Logic (CTL), which is currently not supported by TOrPEDO. 


68 C. Menghi et al. 


Table 5: Cardinalities |S], |R|, |AP|, |?|, and |M] are those of the evaluated 
model M. |{2,|, is the size of proof 2, for a property p; x indicates if 2, is a 
T-TP or a ?-TP. 


ANALYSIS RE-CHECK 


Model |S] |R| [AP] |?| |M] |201| |202| |203| |2os| |2s|]G1 #2 p3 $4 bs 
callee-1 5 15 3 7 31 Te 9 21l? 232 237 |- - - - - 
callee-2 5 15 3 4 31 Te 92 217; 227 x Io Ao Ao Kx 
callee-3 5 15 3 2 31 7 9 21? 237 x Vss- 
callee-4 5 15 3 0 31 x x 237 217 x XXS- 
Model [S| |R| [AP] 12| |M] [Qu] [Rua] (Rusl [Rya] [Rusl[Vi V2 v3 Ya Ys 
caller-1 6 21 5 4 52 28 x 2 % 2Wl--- -- 
caller-2 7T 22 5 4 58 302 x 2 9% 30 |v -s/y 
caller-3 6 19 5 1 50 26+ 287 27 llr 2rlY¥~ -VVVv 
caller-4 6 21 5 O 52 28+ x 27 97 287 |v vod 
Model IS] |R| [AP] [2] |M] [Qn] [2na] [2na] |@nal [Rasi |m m2 ns m ns 
caller-callee-1 6 30 6 30 61 377 27 157 37? x - - =- - - 
caller-callee-2 7 35 6 36 78 437 27 18) 43? x |W VAS - 
caller-callee-3 7 45 6 38 88 537 27 537 537 5379 |W WWW - 
caller-callee-4 6 12 4 0 42 x x x 19 «K |X X Xo KX 


that, if satisfied, ensure that the property is not violated (or possible violated), 
the smaller are the proofs the more flexibility the designer has, as more elements 
can be changed during the revision. The size of a PKS M = (S, R, So, AP, L) 
was defined as |M| = |AP|*|S|+|R|+|So|. The size of a proof 2 was defined as 
[Q| = > |e] where: |c| = 1 if c = (s,a,v); |e] = |T| if c = (s, T}, and |c] = |Sol] 
cEQ 

if c = (So). Table 5 summarizes the obtained results (columns under the label 
ANALYSIS). We show the cardinalities |S|, |R| and |AP| of the sets of states, 
transitions, and atomic propositions of each considered PKS M, the number |?| 
of couples of a state s with an atomic proposition a such that L(s,a) =?, the 
total size |M| of the model, and the size |2p|x of the proofs, where p indicates 
the considered LTL property and x indicates whether p is satisfied (a = T) or 
possibly satisfied (x =?). Proofs are ~ 60% smaller than their respective initial 
models. Thus, we conclude that the proofs are significantly coincise w.r.t. the 
original model enabling a flexible design. 


RE-CHECK support (EJ). We checked how the results output by the RE-CHECK 
algorithm were useful in producing PKSs revisions. To evaluate the usefulness 
we assumed that, for each category of examples, the designer produced revisions 
following the order specified in Table 5. The columns under the label RE-CHECK 
contain the different properties that have been analyzed for each category. A 
cell contains v if the RE-CHECK was passed by the considered revised model, 
i.e., a true value was returned by the RE-CHECK algorithm, X otherwise. The 
dash symbol - is used when the model of the corresponding line is not a revi- 
sion (i.e., the first model of each category) or when the observed property was 
false in the previous model, i.e., an z-TP was not produced. We inspected the 
results produced by the RE-CHECK algorithm to evaluate their benefit in verify- 
ing if revisions were violating the proofs. Table 5 shows that, in ~ 32% of the 
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cases, the TOrPEDO RE-CHECK notified the designer that the proposed revi- 
sion violated some of the clauses contained in the (2-proof, while in ~ 78% the 
RE-CHECK allowed designers to avoid re-runnning the ANALYSIS (and thus the 
model checker). 

Scalability. The ANALYSIS phase of TOrPEDO combines three-valued model 
checking and UCs computation, therefore its scalability improves as the perfor- 
mance of frameworks enhances. Three-valued model checking is as expensive as 
classical model checking [9], i.e., it is linear in the size of the model and expo- 
nential in the size of the property. UCs computation is FPSPACE complete [55]. 
In our cases running TOrPEDO required on average 8.1s and for the callee ex- 
amples, 8.2s for the caller examples, and 7.15s for the caller-callee examples.® 
However, while model checking is currently supported by very efficient tech- 
niques, UCs computation of LTL formulae is still far from being applicable in 
complex scenarios. For example, we manually designed an additional PKS with 
10 states and 5 atomic propositions and 26 transitions and defined a property 
satisfied by the PKS and with a T-TP proof that requires every state of the 
PKS to be constrained by a TPP-clause. We run TOrPEDO and measure the 
time required to compute this proof. Computing the proof required 1m33s. This 
results show that TOrPEDO has a limited scalability due to the low efficiency 
of the procedure that extracts the unsatisfiable core. For an analysis of the scal- 
ability of the extraction of the unsatisfiable core the interested reader can refer 
to [58]. We believe that reporting the current lack of FM techniques to support 
the proposed framework (that, as just discussed, is effective in our preliminary 
evaluation), is a further contribution of this paper. 


7 Related work 


Partial knowledge has been considered in requirement analysis and elic- 
itation [46,45,38,13], in novel robotic planners [40,41,43], software mod- 
els [66,65,22,1], and testing [15,63,67]. Several researchers analyzed the model 
checking problem for partially specified systems [44,12], considering both three- 
valued [37,25,9,10,28] and multi-valued [30,11] semantics. Other works apply 
model checking to incremental program development [33,6]. However, all these 
model checking approaches do not provide an explanation on why a property is 
satisfied, by means of a certificate or proof. Although several works have tackled 
this problem [4,60,50,49,29,16], differently from this work, they mostly aim to 
automate proof reproducibility. 

Tao and Li [61] propose a theoretical solution to model repair: the problem 
of finding the minimum set of states in a KS which makes a formula satisfi- 
able. However, the problem is different from the one addressed in this paper. 
Furthermore, the framework is only theoretical and based on complete systems. 

Approaches were proposed in the literature to provide explanations by using 
different artifacts. For example, some works proposed using witnesses. A witness 


6 Processor: 2,7 GHz Quad-Core Intel Core i7, Memory: 16 GB 2133 MHz LPDDR3. 
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is a path of the model that satisfies a formula of interest [7,34,48]. Other works 
(e.g., [31,59]) studied how to enrich counterexamples with additional informa- 
tion in a way that allows better understanding the property violation. Work has 
also been done to generate abstractions of the counterexamples that are easier 
to understand (e.g., [21]). Alur et al. [2] analyzed the problem of synthesizing a 
controller that satisfies a given specification. When the specification is not real- 
izable, a counter-strategy is returned as a witness. Pencolé et al. [51] analyzed 
model consistency, i.e., the problem of checking whether the system run-time be- 
haviour is consistent with a formal specification. Bernasconi et al. [4] proposed an 
approach that combines model checking and deductive proofs in a multi-valued 
context. The notion of topological proof proposed in this work is substantially 
different from the notion of deductive proof. 

Some works (e.g., [52,54]) considered how to understand why a property is 
unsatisfiable. This problem is different from the one considered in this paper. 

Approaches that detect unsatisfiable cores of propositional formulae were pro- 
posed in the literature [47,39,17,32,57]. Understanding whether these approaches 
can be re-used to develop more efficient techniques to detect the unsatisfiable 
cores of LTL formulae is definitely an interesting future work direction, which 
deserves to be considered in a separate work since it is far from trivial. 


8 Conclusions 


We have proposed TOrPEDO, an integrated framework that supports the itera- 
tive creation of model revisions. The framework provides a guide for the designer 
who wishes to preserve slices of her model that contribute to satisfy fundamental 
requirements while other parts of the model are modified. For these purposes, 
the notion of topological proof has been formally and algorithmically described. 
This corresponds to a set of constraints that, if kept when changing the proposed 
model, ensure that the behavior of the model w.r.t. the property of interest is pre- 
served. Our Lemmas and Theorems prove the soundness of our framework, i.e., 
how it preserves correctness in the case of PKS and LTL. The proposed frame- 
work can be used as baseline for other FM frameworks, and can be extended by 
considering other modeling formalisms that can be mapped onto PKSs. 

TOrPEDO was evaluated by showing the effectiveness of the ANALYSIS and 
RE-CHECK algorithms included in the framework. Results showed that proofs are 
smaller than the original models, and can be verified in most of the cases using 
a simple syntactic check, paving the way for an extensive evaluation on real case 
scenarios. However, the scalability of existing tools, upon which TOrPEDO is 
based, is not sufficient to efficiently support the proposed framework when bigger 
models are considered. 
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Abstract. Smart contracts can be regarded as one of the most popular 
blockchain-based applications. The decentralized nature of the blockchain 
introduces vulnerabilities absent in other programs. Furthermore, it is 
very difficult, if not impossible, to patch a smart contract after it has 
been deployed. Therefore, smart contracts must be formally verified be- 
fore they are deployed on the blockchain to avoid attacks exploiting these 
vulnerabilities. There is a recent surge of interest in analyzing and veri- 
fying smart contracts. While most of the existing works either focus on 
EVM bytecode or translate Solidity contracts into programs in inter- 
mediate languages for analysis and verification, we believe that a direct 
executable formal semantics of the high-level programming language of 
smart contracts is necessary to guarantee the validity of the verification. 
In this work, we propose a generalized formal semantic framework based 
on a general semantic model of smart contracts. Furthermore, this frame- 
work can directly handle smart contracts written in different high-level 
programming languages through semantic extensions and facilitates the 
formal verification of security properties with the generated semantics. 


Keywords: Blockchain - Smart contracts - Generalized semantics 


1 Introduction 


Blockchain [17] technologies have been studied extensively recently. Smart con- 
tracts [16] can be regarded as one of the most popular blockchain-based applica- 
tions. Due to the very nature of the blockchain, credible and traceable transac- 
tions are allowed through smart contracts without relying on an external trusted 
authority to achieve consensus. However, the unique features of the blockchain 
introduce vulnerabilities [10] absent in other programs. 

Smart contracts must be verified for multiple reasons. Firstly, due to the de- 
centralized nature of the blockchain, smart contracts are different from programs 
written in other programming languages (e.g., C/Java). For instance, the storage 
of each contract instance is at a permanent address on the blockchain. In this 
way, each instance is a particular execution context and context switches are 
possible through external calls. Particularly, in Solidity, delegatecall executes 


© The Author(s) 2020 
H. Wehrheim and J. Cabot (Eds.): FASE 2020, LNCS 12076, pp. 75-96, 2020. 
https: //doi.org/10.1007/978-3-030-45234-6_4 


76 J. Jiao et al. 


programs in the context of the caller rather than the recipient, making it possible 
to modify the state of the caller. Programmers must be aware of the execution 
context of each statement to guarantee the programming correctness. Therefore, 
programming smart contracts is error-prone without a proper understanding of 
the underlying semantic model. Secondly, a smart contract can be deployed on 
the blockchain by any user in the network. Vulnerabilities in deployed contracts 
can be exploited to launch attacks that lead to huge financial loss. Verifying 
smart contracts against such vulnerabilities is crucial for protecting digital as- 
sets. One famous attack on smart contracts is the DAO attack [41] in which the 
attacker exploited the reentrancy vulnerability and managed to take 60 million 
dollars under his control. Thirdly, it is very difficult, if not impossible, to patch 
a smart contract once it is deployed due to the very nature of the blockchain. 


Related Works. There is a surge of interest in analyzing and verifying smart 
contracts [32,12,24,28,26,9,25,31,21,44,20,22,38,36,4,34,43,19,30,35,29,23,46,14]. 
Some of the existing works focus on EVM [2,47] (Ethereum Virtual Machine). 
For instance, a symbolic execution engine called Oyente is proposed in [32] to 
analyze Solidity smart contracts by translating them into EVM bytecode. In ad- 
dition, a complete formal executable semantics of EVM [24] is developed in the 
K-framework to facilitate the formal verification of smart contracts at bytecode 
level. A set of test oracles is defined in [26,45] to detect security vulnerabilities 
on EVM bytecode. In [21], a semantic framework is proposed to analyze smart 
contracts at EVM level. Securify [44] translates EVM bytecode into a stackless 
representation in static-single assignment form for analyzing smart contracts. 
In other works, Solidity smart contracts are translated into programs in inter- 
mediate languages for analysis and verification. Specifically speaking, Solidity 
programs are formalized with an abstract language and then translated into 
LLVM bitcode in Zeus [28]. Similarly, Boogie is used to verify smart contracts 
as an intermediate language in the proposed verifiers in [31,23]. In addition, the 
formalization in F* [12] is an intermediate-level language for the equivalence 
checking of Solidity programs and EVM bytecode. In [22], a simple imperative 
object-based programming language, called SMAC, is used to facilitate the on- 
line detection of Effectively Callback Free (ECF) objects in smart contracts. To 
conclude, most of the existing approaches either focus on EVM bytecode, or 
translate Solidity smart contracts into programs in intermediate languages that 
are suitable for verifying smart contracts or detecting potential issues in associ- 
ated verifiers or checkers. Furthermore, none of the existing works can directly 
handle smart contracts written in different high-level programming languages 
without translating them into EVM bytecode or intermediate languages. 


Motivations. A direct executable formal semantics of the high-level smart 
contract programming language is a must for both understanding and verifying 
smart contracts. Firstly, programmers write and reason about smart contracts 
at the level of source code without the semantics of which they are required to 
understand how Solidity programs are compiled into EVM bytecode in order to 
understand these contracts, which is far from trivial. In addition, there may be se- 
mantic gaps between high-level smart contract programming languages and low- 
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level bytecode. Therefore, both high-level [27,48,49,15,11] and low-level [24,21] 
semantics definitions are necessary to conduct equivalence checking to guarantee 
that security properties are preserved at both levels and reason about compiler 
bugs. Secondly, even though smart contracts can be transformed into programs 
in intermediate languages to be analyzed and verified in existing model checkers 
and verifiers, the equivalence checking of the high-level smart contract program- 
ming language and the intermediate language considered is crucial to the validity 
of the verification. For instance, most of the false positives reported in Zeus [28] 
are caused by the semantic inconsistency of the abstract language and Solidity. 


As domain-specific languages, high-level smart contract programming lan- 
guages, such as Solidity, Vyper, Bamboo, etc, intend to implement the correct 
or desired semantics of smart contracts although they may not actually achieve 
this. This means that these languages are semantically similar in order to in- 
terpret the same high-level semantics of smart contracts. For instance, Vyper is 
quite similar to Solidity in spite of syntax differences and the semantics inter- 
preted by Bamboo is consistent with that of Solidity (cf. Section 2.1 for details). 
Considering this fact, we propose a generalized formal semantic framework based 
on a general semantic model of smart contracts. Different from previous works 
which either analyze and verify smart contracts on EVM semantics or interpret 
Solidity semantics with the semantics of intermediate languages, the proposed 
framework aims to generate a direct executable formal semantics of a particular 
high-level smart contract programming language to facilitate the high-level verifi- 
cation of contracts and reason about compiler bugs. Furthermore, this framework 
provides a uniform formal specification of smart contracts, making it possible to 
apply verification techniques to contracts written in different languages. 


Challenges. The challenges of developing a generalized formal semantic 
framework mainly lie in the construction of a general semantic model of smart 
contracts. Firstly, different high-level smart contract programming languages dif- 
fer in syntax which limits state transitions. Compared with Solidity, Vyper [8] 
and Bamboo [1] have more syntax limits to exclude some vulnerabilities reported 
in Solidity. For instance, Vyper eliminates gasless send by blocking recursive 
calls and infinite loops, and reentrancy attacks by excluding the possibility 
of state changes after external calls [40]. In addition, there are no state variables 
in Bamboo and each contract represents a particular execution state, making 
it possible to limit operations to certain states to prevent attacks. Therefore, 
we need to take into account the syntax differences when constructing a gen- 
eral semantic model for smart contracts. Secondly, semantics developed with the 
general semantic model must be direct to guarantee the validity of the verifi- 
cation. For instance, as discussed above, even though intermediate languages 
may be a good solution to construct a general semantic model, they introduce 
semantic-level equivalence checking issues due to pure syntax translations. 


Contributions. In this work, we develop a generalized formal semantic 
framework for smart contracts. The contributions of this work lie in three as- 
pects. Firstly, our work is the first approach, to our knowledge, to a generalized 
formal semantic framework for smart contracts which can directly handle con- 
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tracts written in different high-level programming languages. Secondly, a gen- 
eral semantic model of smart contracts is constructed with rewriting logic in the 
K-framework. With the general semantic model, a direct executable formal se- 
mantics of a particular high-level smart contract programming language can be 
constructed as long as its core features fall into the ones defined in this model. 
The general semantic model is validated with its interpretation in Solidity using 
the Solidity compiler test set [6] and evaluation results show that it is complete 
and correct. Lastly, the generated semantics facilitates the formal verification 
of smart contracts written in a particular high-level programming language as 
a formal specification of the corresponding language. Together with low-level 
specifications [24,21], it allows us to conduct equivalence checking on high-level 
programs and low-level bytecode to reason about compiler bugs and guarantee 
that security properties are preserved at both levels. 

Outline. The remaining part of this paper is organized as follows. In Sec- 
tion 2, we introduce smart contracts and the K-framework. The general semantic 
model of smart contracts is introduced in Section 3. In Section 4, we take Solidity 
as an example to illustrate how to generate a direct executable formal semantics 
of a particular high-level smart contract programming language based on the 
general semantic model. Section 5 shows the evaluation results of the proposed 
framework. Section 6 concludes this work. 


2 Preliminaries 


In this section, we briefly introduce smart contracts and the K-framework. 


2.1 Smart Contracts 


Solidity Smart Contracts. Ethereum [2,47], proposed in late 2013 by Vita- 
lik Buterin, is a blockchain-based distributed computing platform supporting 
the functionality of smart contracts. It provides a decentralized international 
network where each participant node equipped with EVM can execute smart 
contracts. It also provides a cryptocurrency called “ether” (ETH) which can be 
transferred between different accounts and used to compensate participant nodes 
for their computations on smart contracts. 

Solidity is one of the high-level programming languages to implement smart 
contracts on Ethereum. A smart contract written in Solidity can be compiled 
into EVM bytecode and executed by any participant node equipped with EVM. 
A Solidity smart contract is a collection of code (its functions) and data (its 
state) that resides at a specific address on the Ethereum blockchain [7]. Fig. 1 
shows an example of Solidity smart contracts, named Coin, implementing a 
very simple cryptocurrency. In line 2, the public state variable minter of type 
address is declared to store the address of the minter of the cryptocurrency, i.e., 
the owner of the smart contract. The constructor, denoted by constructor (), 
is defined in lines 5-7. Once the smart contract is created and deployed?, its 
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1 | contract Coin { 

2 address public minter; 

3 mapping (address => uint) public balances; 

4 

5 constructor() public { 

6 minter = msg.sender; 

7 } 

8 

9 function mint (address receiver, uint amount) public { 
10 if (msg.sender != minter) return; 

11 balances[receiver] += amount; 

12 } 

13 

14 function send(address receiver, uint amount) public { 
15 if (balances [msg.sender] < amount) return; 

16 balances [msg.sender] -= amount; 

17 balances[receiver] += amount; 

18 } 

19 |} 


Fig. 1. Solidity Smart Contract Example 


constructor is invoked automatically, and minter is set to be the address of 
its creator (owner), represented by the built-in keyword msg. sender. In line 3, 
the public state variable balances is declared to store the balances of users. 
It is of type mapping, which can be considered as a hash-table mapping from 
keys to values. In this example, balances maps from a user (represented as an 
address) to his/her balance (represented as an unsigned integer value). The mint 
function, defined in lines 9-12, is supposed to be invoked only by its owner to 
mint coins, the number of which is specified by amount, for the user located 
at the receiver address. If mint is called by anyone except the owner of the 
contract, nothing will happen because of the guarding if statement in line 10. 
The send function, defined in lines 14-18, can be invoked by any user to transfer 
coins, the number of which is specified by amount, to another user located at the 
receiver address. If the balance is not sufficient, nothing will happen because 
of the guarding if statement in line 15; otherwise, the balances of both sides 
will be updated accordingly. 


A blockchain is actually a globally-shared transactional database or ledger. 
If one wants to make any state change on the blockchain, he or she has to cre- 
ate a so-called transaction which has to be accepted and validated by all other 
participant nodes. Furthermore, once a transaction is applied to the blockchain, 
no other transactions can alter it. For example, deploying the Coin smart con- 
tract generates a transaction because the state of the blockchain is going to be 
changed, i.e., one more smart contract instance will be included. Similarly, any 
invocation of the function mint or send also generates a transaction because 
the state of the contract instance, which is a part of the whole blockchain, is 
going to be changed. Transactions have to be selected and added into blocks to 
be appended to the blockchain. This procedure is the so-called mining, and the 
participant nodes are called miners. 


Vyper Smart Contracts. Vyper is a high-level programming language for 
smart contracts running on EVM. As an alternative to Solidity, Vyper is con- 
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1 |minter: public(address) 

2 | balances: map(address, wei_value) 

3 

4 | @public 

5 | def __init__(): 

6 self.minter = msg.sender 

7 

8 | @public 

9 def mint(receiver: address, amount: wei_value): 
10 if (msg.sender != self.minter): return 
11 self.balances[receiver] += amount 


13 | @public 
14 def send(receiver: address, amount: wei_value): 


15 if (self.balances[msg.sender] < amount): return 
16 self.balances[msg. sender] -= amount 
17 self.balances[receiver] += amount 


Fig. 2. Vyper Smart Contract Example 


sidered to be more secure by blocking recursive calls and infinite loops to avoid 
gasless send, and excluding the possibility of state changes after external calls 
to prevent reentrancy attacks [40]. Thus, it is more difficult to write vul- 
nerable code in Vyper. In addition, it supports bounds and overflow checking, 
and strong typing. Particularly, timing features such as block timestamps are 
supported as types, making it possible to detect the vulnerability of timestamp 
dependence [32] on Vyper semantics. This is not possible on Solidity semantics 
since Solidity does not support timing features. Apart from security, simplicity 
is another goal of Vyper. It aims to provide a more human-readable language, 
and a simpler compiler implementation. An example Vyper smart contract cor- 
responding to the Solidity smart contract illustrated in Fig. 1 is shown in Fig. 2. 


Bamboo Smart Contracts. Bamboo is another high-level programming 
language for Ethereum smart contracts. In Bamboo, state variables are elimi- 
nated and each contract represents a particular execution state, making state 
transitions explicit to avoid reentrancy attacks by default. This is because 
operations in functions are limited to certain states. An example Bamboo smart 
contract which is equivalent to the Solidity smart contract illustrated in Fig. 1 is 
shown in Fig. 3. In this example, explicit state transitions are applied to strictly 
limit operations in the constructor to a certain state. To be specific, the default 
part in the contract PreCoin which is equivalent to the constructor in Fig. 1 can 
only be invoked once, after which the state is always Coin. This is consistent 
with the fact that the constructor of a Solidity smart contract is only invoked 
once when a new contract instance is created. 


Comparison. As introduced above, Vyper smart contracts are similar to So- 
lidity smart contracts regardless of the differences in syntax formats. Compared 
with Solidity, Vyper simply excludes the vulnerabilities reported in Solidity at 
syntax level. Apart from the syntax differences, explicit state transitions are ap- 
plied in Bamboo to prevent potential attacks. Despite the limits in syntax and 
state transitions, high-level smart contract programming languages have a lot in 
common in semantics due to the fact that they have to be functionally the same. 
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1 contract PreCoin(address => uint balances){ 

2 default{ 

3 return then become Coin(sender(msg), balances); 

4 } 

5 |} 

6 

7 contract Coin(address minter, address => uint balances){ 
8 case(void mint (address receiver, uint amount)){ 

9 if (sender(msg) != minter) 

10 return then become Coin(minter, balances); 

11 balances[receiver] = balances[receiver] + amount; 
12 return then become Coin(minter, balances); 

13 } 

14 case(void send(address receiver, uint amount)){ 

15 if (balances[sender(msg)] < amount) 

16 return then become Coin(minter, balances); 

17 balances[sender(msg)] = balances[sender(msg)] - amount; 
18 balances[receiver] = balances[receiver] + amount; 
19 return then become Coin(minter, balances); 

20 } 

21 |} 


Fig. 3. Bamboo Smart Contract Example 
2.2 The K-framework 


The K-framework (K) [39] is a rewriting logic [33] based formal executable se- 
mantics definition framework. The semantics definitions of various programming 
languages have been developed using K, such as Java [13], C [18], etc. Partic- 
ularly, an executable semantics of EVM [24], the bytecode language of smart 
contracts, has been constructed in the K-framework. K backends, like the Is- 
abelle theory generator, the model checker, and the deductive verifier, can be 
utilized to prove properties on the semantics and construct verification tools [42]. 

A language semantics definition in the K-framework consists of three main 
parts, namely the language syntax, the configuration specified by the developer 
and a set of rules constructed based on the syntax and the configuration. Given 
a semantics definition and some source programs, the K-framework executes 
the source programs based on the semantics definition. In addition, specified 
properties can be verified by the formal analysis tools in K backends. We take 
IMP [37], a simple imperative language, as an example to show how to define a 
language semantics in the K-framework. 

The configuration of the IMP language is shown in Fig. 4. There are only 
two cells, namely k and state, in the whole configuration cell T. The cells in 
the configuration are used to store some information related to the program 
execution. For instance, the cell k stores the program for execution Pgm, and in 
the cell state a map is used to store the variable state. 


( ( $PGM:Pgm), ( .Map)state jr 


Fig. 4. IMP Configuration 


Here, we introduce some basic rules in the K-IMP semantics. These rules are 
allocate, read and write. The syntax of IMP is also given in Fig. 5. 
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Pgm ::= "int" Ids ";" Stat Ids ::= List{Id, ","} 

AExp ::= Int | Id | "-" Int | AExp "/" AExp > AExp "+" AExp | "(" AExp ")" 
BExp ::= Bool | AExp "<=" AExp | "!" BExp > BExp "&&" BExp | "(" BExp ")" 
Block ::= "{"* "}" [| "{" Stmt "}" 

Stmt ::= Block | Id "=" AExp ";" | "if" "(" BExp ")" Block "else" Block | 
"while" "(" BExp ")" Block > Stmt Stmt 


Fig. 5. Syntax of IMP 


RULE ALLOCATE 


int X,Xs;$ Rho : Map RULE FINISH- ALLOCATE 
int Xs;8 e \ Rho @ I> op” ES si 
requires notBool (X in keys(Rho)) S 
RULE WRITE 

e READ X = I:Int; ) ( X l-> ) 

: —— -]k ria = . - - state 

wedi, Cos (ES Tas : X |-> I 

( I ) ( | Ystate | 


Let us start with the rule of memory allocations in IMP shown in ALLOCATE. 
When Pgn, interpreted as int X,Xs;S, is encountered, we need to store a list of 
variables (X,Xs) starting from X in the cell state with a list of mappings. Here 
state can be regarded as a physical memory or storage, and Xs is also a list of 
variables which can be empty. X is popped out of the cell k and a new mapping 
from X to 0 is created in the cell state, which means that a memory slot has 
been allocated for X to store its initial value 0. No duplicate names are allowed 
in state, which is guaranteed by the require condition. Then we go like this until 
Xs becomes empty, which means that all the variables have already been stored 
in state. At this point, the execution of the first part of Pgm has been finished 
and we proceed to the execution of the statement S. This can be summarized 
in FINISH-ALLOCATE where .Ids is an empty list of identifiers, which means 
that the variable list is empty. Please note that . means an empty set in the 
K-framework. If a rule ends with ., it means that nothing will be executed. 

Then we come to the rules of read and write for variables. As shown in 
READ, if we want to look up the value of the variable X, we need to search 
it in the cell state by mapping the variable name X to its value I. So the 
evaluation of this expression X is its value I. If we cannot find a mapping for X, 
the program execution will stop at this point. Particularly, ... means there can 
be something in the corresponding position. For instance, the mapping of X can 
be in any position in the cell state. However, for rules in the cell k, ... can only 
be at the end since the program which is stored in k is executed sequentially. 
As illustrated in WRITE, if we want to assign the integer I to the variable X, 
similarly we need to search it in state by mapping the variable name. We also 
need to rewrite the value of X, denoted by “_” which is a placeholder, to I. 

Rewriting logic facilitates the construction of a general semantic model for 
smart contracts. This is because a rewriting logic style semantics consists of a set 
of rewriting steps from the language syntax to its evaluations. In spite of syntax 
differences, different smart contract languages have a lot in common in logical 
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RULE ALLOCATE-GENERAL 
#allocate(X, I) Rho: Map 
. k Rho ea -> ID) state 
requires notBool (X in keys (Rho)) 


RULE READ-GENERAL 
( #read (X) RULE WRITE-GENERAL 


I ae Ls: #write(X, I) x l-> - 
S egsa pee 
(a. K l-> ae i 


RULE ALLOCATE-IMP 


int X,Xs;S RULE READ-IMP RULE WRITE-IMP 
Yallocate(X, 0) X:Id X = I:Int; 
ae #read(x) "''/" #write(x, 1) "'/* 


cy int Xs;S 


MemoryOperations ::= #read(Id) | #write(Id, Int) | #allocate(Id, Int) 


Fig. 6. Syntax of General Memory Operations 


aspects to achieve the equivalent functionality. Rewriting logic makes it possible 
to separate the language syntax from the common logical aspects based on which 
the general semantic model is constructed. The semantics rules introduced above 
can be general and not specific to IMP. We show the general rules for read, 
write and allocate in READ-GENERAL, WRITE-GENERAL and ALLOCATE- 
GENERAL, respectively. In these rules, #read, #write and #allocate represent 
the functions to read, write and allocate memory slots for variables with specified 
parameters and their syntax is shown in Fig. 6. The semantics rules for memory 
operations in IMP can be obtained by rewriting the corresponding IMP syntax 
to the general memory operations defined above, namely #read, #write and 
#allocate, which form a general semantic model. The semantics rules for read, 
write and allocate in IMP based on the general semantic model are shown in 
READ-IMP, WritTE-IMP and ALLOCATE-IMP, respectively. Particularly, the 
symbol œ~ means “followed by”. The semantics rules interpreted with the internal 
semantics of the general memory operations defined in Fig. 6 are equivalent to 
those developed from scratch, namely READ, WRITE and ALLOCATE. Rather 
than pure syntax translations to intermediate languages, a general semantic 
model enables semantic-level mappings to commonly shared high-level features. 


3 A General Semantic Model 


Different high-level smart contract programming languages vary in syntax but 
have a lot in common semantically to achieve the equivalent functionality. Con- 
sidering this fact, we construct a general semantic model for smart contracts 
based on the commonly shared high-level semantic features that are indepen- 
dent of any specific language or platform. The semantics of a high-level smart 
contract programming language can be summarized into three aspects in terms 
of its functionality, namely memory operations, new contract instance creations 
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and function calls. Particularly, new contract instance creations and function 
calls are the two kinds of transactions on the blockchain. In this section, we 
present an overview of the desired semantics of these three core features. 


3.1 Syntax 


The syntax of the general semantic model is defined in the K-framework and 
shown in Fig. 7. Due to limit of space, we only present the syntax of rewriting 
steps related to memory operations, new contract instance creations and function 
calls with MemOp, NewInstanceCreation and InstanceStateUpdate, respec- 
tively. Particularly, ExpressionList is a list of Expressions. TypeName consists 
of ElementaryTypeName which takes one memory slot, ComplexTypeName which 
is composed of a set of ElementaryTypeNames, and ReferenceTypeName which 
refers to a pre-defined instance. For Solidity, ElementaryTypeName consists of 
all the elementary types defined in the official documentation [7] except Byte. 
ComplexTypeName refers to mappings, arrays and Byte. ReferenceTypeName in- 
volves user-defined types and function types. Id stands for identifiers. Int and 
Bool represent integers and Boolean values, respectively. Values, a subset of 
ExpressionList, is a list of Value types which can be integers (Int) or Boolean 
types (Bool). Msg is the type of transaction information. VarInfo stores variable 
information. MemberAccess deals with expressions in member access formats. 


RewritingSteps ::= MemOp | NewInstanceCreation | InstanceStateUpdate 


= read(Expression) | readAddress(Int, Id) | write(Expression, Value) 
| writeAddress(Int, Id, Value) | allocate(Int, VarInfo) 
| allocateAddress(Int, Int, Id, Value) 


MemOp 


NewInstanceCreation ::= createNewInstance(Id, ExpressionList) 
| updateState(Id) | allocateStorage (Id) 
| initInstance(Id, ExpressionList) 


InstanceStateUpdate ::= functionCall(Expression; Expression; Id; 
ExpressionList; Msg) | functionCall(Id; ExpressionList) 
| switchContext(Int, Int, Id, Msg) | returnContext (Int) 
| exception() | updateExceptionState() | revertState() 

Expression ::= Id | Value | Msg | VarInfo | MemberAccess 

ExpressionList ::= List{Expression, ","} | Values 

Value ::= Int | Bool Values ::= List{Value, ","} 

Msg ::= #msgInfo(Int, Int, Int, Int) 

VarInfo ::= #varInfo(Id, TypeName, Id, Value) 

MemberAccess ::= #memberAccess(Expression, Id) 

TypeName ::= ElementaryTypeName | ComplexTypeName | ReferenceTypeName 


Fig. 7. Syntax of the General Semantic Model 


3.2 Configuration 


The runtime configuration indicates program states at each execution step, mak- 
ing detailed runtime features available. The runtime configuration of the general 
semantic model is illustrated in Fig. 8. Due to limit of space, only a part of the 
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cells is presented here. In this configuration, there are six main cells in the whole 
configuration cell T and they are k, controlStacks, contracts, functions, 
contractInstances and transactions. The value of each cell is initialized in 
the configuration with its type specified. A dot followed by any type represents 
an empty set of this type. For instance, .List is an empty list. Particularly, K is 
the most general type which can be any specific type defined in the K-framework. 


( $PGM:SourceUnit Yk 
( ( ListItem(-1) XcontractStack ( -List X functionStack 
( „List )newStack -List )blockStack 
( 0:Int XentContractDefs 
( ( -K )cname ( -List )stateVars ( false XConstructor a  eouthects 
0:Int )entFunctions 
( 0:Int Xjra ( -K)f Name ( K JinputParameters 


-K )returnParameters -K Body -++/functionx |functions 


Jeonrastacks 
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-K funQuantifiers 
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( (-1) :Int Yetta ( -K )ctName 
( -Map )ctContext -Map )globalContext 
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( ( 1:Int YentTrans ( 0 |-> "Main" Jiran Computation 


ý transactions 
-K)Msg ( List YmsgStack ) 


Fig. 8. Runtime Configuration of the General Semantic Model 


In k, source programs, called SourceUnit, are stored for execution. If the 
programs stored in k terminate in a proper way, there will be a dot in this cell, 
indicating that this cell is empty and there are no more programs to execute. 

controlStacks consists of contractStack, functionStack, newStack and 
blockStack. To be specific, contractStack keeps track of the current contract 
instance. functionStack stores a list of function calls. newStack records a list 
of new contract instance creations. blockStack stores a list of variable contexts 
to look up and assign values to variables in different scopes. 

In contracts, a set of contract definitions is stored. Each cell contract 
represents a contract definition. The number of distinct contracts is counted in 
cntContractDefs. In contract, the contract name is stored in cName. State 
variable information is stored in stateVars. In addition, Constructor indicates 
whether the contract has a constructor or not and its initial value is false. 


Similarly, functions stores a set of function definitions. Each cell function 
represents a function definition. The total number of function definitions is stored 
in cntFunctions. For each function definition, the function Id and the function 
name are stored in fId and fName, respectively. In addtion, function parameters, 
including input parameters and return parameters, are recorded in the corre- 
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sponding cells. We also store the function body in the cell Body and the function 
quantifiers which can be modifiers or specifiers in the cell funQuantifiers. 

In contractInstances, there is a set of contract instances. Each cell contract- 
Instance represents a contract instance. The number of contract instances is 
counted in cntContracts. We store the contract instance Id and the name of 
its associated contract in the cells ctId and ctName, respectively. Four different 
mappings are applied to keep track of more information of a variable. Specifi- 
cally speaking, ctContext, ctType, ctLocation and ctStorage/Memory record 
the mappings from a variable name to its logical address in the storage or mem- 
ory, a variable name to its type, a variable name to its location information, 
namely “global” or “local”, and the logical address of a variable in the storage 
or memory to its value, respectively. globalContext keeps track of the state 
variable context. The number of memory slots taken by variables is calculated 
in slotNum. The cell Balance records the balance of each contract instance. 

In the cell transactions, we keep track of the number of transactions in 
cntTrans, every transaction in tranComputation and also “msg” information 
in Msg and msgStack. “msg” is a keyword in smart contracts to represent trans- 
action information. For instance, “msg.sender” is the caller of the function and 
“msg.value” specifies the amount of ether to be transferred in Solidity. The cell 
msgStack stores a list of transaction information tuples while Msg records the 
current one. We simulate transactions of smart contracts with a “Main” contract 
which is similar to the main function in C. In the “Main” contract, new contract 
instances can be created and external function calls to these instances are avail- 
able. The Id of the “Main” contract is “-1”, since other contract instances start 
from 0. Therefore, the initialized content in contractStack is ListItem(-1), 
and cntTrans is counted from 1, which means that the creation of the “Main” 
contract is the first transaction recorded in tranComputation. 


3.3 Semantics of the Core Features 


We introduce the semantics rules for the core features in smart contracts. Due to 
limit of space, the implementation details (cf. [3]) of the sub-steps are omitted. 

Memory Operations. We present an overview of the semantics rules for 
memory operations on elementary types, such as int, uint and address in So- 
lidity, each of which takes only one memory slot. Complex types, such as arrays, 
mappings, etc, are compositions of elementary types. A memory operation on a 
complex type can be regarded as a set of recursive memory operations on elemen- 
tary types. For instance, the memory allocation for a one-dimensional fixed-size 
array is equivalent to allocating an elementary type for each index of this array. 
Reading and writing a particular index involve recursive steps to retrieve the 
logical address of this index from the base address of the array. Mappings are 
similar to dynamic arrays. For a mapping from address to uint, the memory 
allocation for this mapping is equivalent to allocating an unsigned integer type 
at each address involved. Reference types which refer to pre-defined instances 
can be simply implemented as mappings in the K-framework. 
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RULE READ 
read(X:Id) 


readAddress(Addr, L) |" 
( N cera Q .. X |-> Addr.. XctContezt 


. X |-> L.. ei Ledation +++ /contractInstance 


(ess X |-> T:ElementaryTypeName -- -YctType 


) ( ListItem(N:Int) ...)contractStack 


RULE WRITE 
write(X:Id, V:Value) 


writeAddress(Addr, L, V) ``? 
( N cera Q .. X |-> Addr.. YctContezt 


. X |-> L.. .)ctLocation +++ JcontractInstance 


ss Xx |-> T:ElementaryTypeName ...)crType 


) ( ListItem(N:Int) ...)contractStack 


RULE ALLOCATE 
allocate(N:Int, #varInfo(X:Id, T:ElementaryTypeName, L:Id, V:Value)) 


allocateAddress(N, Addr, L, V) 


Addr TYPE: Map 
( N Jetra ( sa a) otto ey erm 
CON : Map LOC: Map +++ |contractInstance 


RULE NEw-CoNnTRACT-INSTANCE-CREATION 
createNewInstance(X:Id, E:ExpressionList) 
k 


updateState(X)  allocateStorage(X) ^œ initInstance(X, E) "'/" 


RULE FUNCTION-CALL 
functionCall(C:Int; R:Int; F:Id; Es:Values; M:Msg) 
k 


switchContext(C, R, F, M) ^œ functionCall(F; Es) ^œ returnContext(R) `` `/" 


Let us start with the read operation on elementary types shown in READ. 
Here, we consider the object X as a variable which is an Id type. The first thing 
to do is to get the current execution context. This is achieved by retrieving the 
current contract instance Id N in contractStack and mapping the corresponding 
contract instance with N in the cell ctId. After that, we retrieve the logical 
address of X, denoted by Addr, in ctContext and the location information of 
X, denoted by L, in ctLocation. With these two parameters, we can obtain the 
evaluation of X through readAddress which retrieves the value located at Addr 
in the associated cell specified by L. To be specific, if L specifies this variable 
as a global one, the search space is ctStorage. Otherwise, the value is retrieved 
in Memory. write is similar to read. After retrieving the logical address of X, 
denoted by Addr, and the location information of X, denoted by L, we rewrite 
the value at Addr to the value V in the cell specified by L through writeAddress. 


Then we come to the allocation for elementary types shown in ALLOCATE. 
The first input parameter N indicates the object contract instance Id. The vari- 
able information including the name X, the type T, the location information L 
and the initial value V, is stored in #varInfo. First, we retrieve the correspond- 
ing instance by mapping the Id N in ctId. Then the number of memory slots is 
increased by 1 in slotNum. After that, the variable information is recorded in the 
associated cells. To be specific, we record the logical address Addr, the type T, and 
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the location information L in ctContext, ctType and ctLocation, respectively. 
Finally, a memory slot is allocated for this variable through allocateAddress. 


New Contract Instance Creations. As illustrated in NEwW-CONTRACT- 
INSTANCE-CREATION, the contract name X and the arguments in the construc- 
tor E are taken as input parameters to create a new instance of X. There 
are altogether three sub-steps for this transaction and they are updateState, 
allocateStorage and initInstance. To be specific, updateState updates the 
blockchain states, including the states of contract instances and transactions, 
and the stack information to indicate the new contract instance creation. In ad- 
dition, allocateStorage allocates state variables and initInstance deals with 
initialization issues, such as calling the constructor, in the new instance. 

Function Calls. In order to make the semantics of function calls general 
for all kinds of calls and extensible for different smart contract languages, a 
uniform format is applied to generalize the semantics. The uniform format is 
functionCall(Id_of_Caller; Id_of_Recipient; Function_Name; Arguments; 
Msg_Info). Particularly, Msg_Info represents the transaction information, in- 
cluding the Ids of the caller and the recipient instances, the value of digital 
assets to be transferred and the transaction fees to be consumed. The semantics 
rule for function calls based on this format is shown in FUNCTION-CALL. 


In the rule FUNCTION-CALL, the caller of this function is C and the recipient 
is R. F is the function name and Es specifies the function call arguments. M is the 
“msg” information to keep track of transactions. In particular, the types of these 
parameters have been specified. The semantics of function calls is designed from 
a general point of view. Each external function call is regarded as an extension 
of an internal function call. Whenever there is an external function call, we first 
switch to the recipient instance and then call the function in this instance as an 
internal call. Finally, we switch back to the caller instance. In this way, external 
function calls can be achieved through internal function calls and switches of 
contract instances. This mechanism also applies to internal function calls where 
the caller is the same as the recipient. There are three sub-steps in FUNCTION- 
CALL. The first one is to switch to the recipient instance from the caller through 
switchContext. The second is an internal function call functionCall. The last 
one is to return to the caller instance through returnContext. 


Particularly, the semantics of function calls is equipped with exception han- 
dling features. If an exception is encountered, it will be propagated to the trans- 
actional function call to revert the whole transaction. The propagation of excep- 
tions is a sub-step in returnContext. The exception handling mechanism is also 
general, making it possible to deal with all kinds of exception handling features 
in smart contracts, such as revert and assert in Solidity, in a similar way. 


RULE EXCEPTION- PROPAGATION RULE TRANSACTION-REVERSION 


exception() exception() 
k 


updateExceptionState() `` ` 
( ListItem(R)ListItem(C) ...)contractStack A^ revertState() 
requires C >=Int 0 ( ListItem(R)ListItem(-1) )contractStack 


updateExceptionState() ... lk 
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There are two stages in handling exceptions. The first one is the propaga- 
tion of exceptions to the transactional function call as shown in EXCEPTION- 
PROPAGATION, and the second is the reversion of the transaction as shown in 
TTRANSACTION-REVERSION. The first stage is present in nested calls to propa- 
gate exceptions to the transactional function call, while the second stage is only 
present in the transactional function call stemming from the “Main” contract. 
In the stage of propagating exceptions, the exception state is updated through 
updateExceptionState() to indicate that an exception has been encountered. 
Particularly, the Id of the caller instance should be larger than or equal to 0 since 
the caller cannot be the “Main” contract. And in the stage of reverting transac- 
tions, the caller is the “Main” contract whose Id is “-1”. In addition to updating 
the exception state, the whole transaction is reverted through revertState(). 


4 Direct Semantics Generation 


A direct semantics of a high-level smart contract programming language can be 
developed based on the general semantic model introduced above. From the per- 
spective of rewriting logic, a language semantics is a set of rewriting steps from 
the language syntax to its evaluations. Each of these rewriting steps implements 
a function to move the syntax a step further to its final evaluations. The general 
semantic model which consists of a set of internal rewriting steps and defines the 
desired semantics of smart contracts can be regarded as a logical intermediate 
language. A direct semantics of a high-level smart contract programming lan- 
guage can be constructed by rewriting its syntax to the features in the general 
semantic model with several functional steps. This also indicates the process of 
smart contract language design. We take Solidity as an example to illustrate how 
to generate the semantics based on the general semantic model. The semantics 
rules presented below are based on the Solidity syntax defined in [7]. 

Let us start with the look-up operation in Solidity. As shown in LOOK-UP, 
the object is considered to be a variable X. X is evaluated with read in the general 
semantic model. We simply rewrite the corresponding Solidity syntax to read. 
assignment is similar to look-up. As shown in ASSIGNMENT, we simply rewrite 
the assignment syntax in Solidity to write in the general semantic model. 


RULE Look-Up RULE ASSIGNMENT RULE NEw-INSTANCE-SOLIDITY 
X:Id X:Id = V:Value new X:Id (E:ExpressionList) 
read(x) `E write(X, V) k createNewInstance(X, E) k 


Both state and local variable allocations are achieved through allocate in 
the general semantic model. State variables are allocated when new contract 
instances are created, while local variables are allocated right after declarations. 

In NEW-INSTANCE-SOLIDITY, the syntax of new contract instance creations 
in Solidity is rewritten to createNewInstance in the general semantic model. 

Function calls in Solidity are written in a format similar to member access. 
For instance, target .deposit.value(2) () is a typical function call in Solidity. 
To be specific, target specifies the recipient instance and deposit is the function 
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RULE FUNCTION-CALL-SOLIDITY 
#memberAccess(R:Int, F:Id) œ Es:Values ^ MsgValue:Int ^ MsgGas:Int 
k 


functionCall(C; R; F; Es; #msgInfo(C, R, MsgValue, MsgGas)) 
( ListItem(C:Int) . . -XcontractStack 


RULE REVERT RULE ASSERT 
revert (.ExpressionList) ; assert (true); assert (false); 
exception() ‘ tone exception() $ 


RULE REQUIRE 
require (true); require (false); 
Penk exception() =k 


to be called in that instance. value specifies msg.value as 2. In addition, we 
can specify other parameters, such as msg.gas, function arguments, etc. When 
it comes to the semantics of function calls in Solidity, the first thing to do 
is to decompose the member access like format and transform it into the one 
in the general semantic model. As shown in FUNCTION-CALL-SOLIDITY, each 
decomposed part in Solidity calls is reorganized in functionCall. Specifically 
speaking, #memberAccess(R:Int, F:Id) specifies the recipient instance R and 
the function to be called in this instance F. Es specifies the function arguments. 
MsgValue and MsgGas represent msg.value and msg. gas, respectively. 

The semantics rules for function calls apply to all kinds of function calls in 
Solidity, including high-level and low-level calls, constructors and fallback func- 
tions. For instance, if there is no function name specified in a function call or 
the specified function name does not match any existing function in the recip- 
ient instance, the first decomposed part in FUNCTION-CALL-SOLIDITY will be 
#memberAccess(R:Int, String2Id("fallback")) where R is the Id of the re- 
cipient instance and “fallback” refers to the fallback function in that instance. 
In this case, the fallback function in R will be invoked. In addition, in the case 
of delegatecall, the recipient instance R is the same as the caller instance C 
since the execution takes place in the caller’s context. 

Exception handling features in Solidity can be interpreted with the semantics 
of exception() in the general semantic model. The semantics rules for revert, 
assert and require are shown in REVERT, ASSERT and REQUIRE, respectively. 


5 Evaluation 


We evaluate the proposed generalized formal semantic framework for smart con- 
tracts by showing that the generated semantics, an interpretation of the general 
semantic model with a particular language, is consistent with the semantics 
interpreted by the corresponding official compiler on benchmarks. The testing 
language makes no difference to the evaluation since it aims to validate the 
semantics of the commonly shared high-level features defined in the general se- 
mantic model. We take Solidity as an object for the evaluation since there are 
sufficient Solidity smart contracts available for testing the generated Solidity 
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Table 1. Coverage of the Generated Solidity Semantics 
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Features Coverage Features Coverage 

Types(Core) Statements(Core) 

Elementary Types If Statement FC 
address FC While Statement FC 
bool FC For Statement FC 
string FC Block FC 
Int FC Inline Assembly N 
Vint FC Statement 
Byte FC Do While Statement FC 
Fixed N Place Holder Statement FC 
Ufixed N Continue FC 

User-defined Types FC Break FC 

Mappings FC Return FC 

Array Types FC Throw,Revert ,Assert ,Require FC 

Function Types FC Simple Statement FC 

address payable FC Emit Statement FC 

Functions(Core) Expressions(Core) 

Function Definitions Bitwise Operations FC 
Constructors FC Arithmetic Operations FC 
Normal Functions FC Logical Operations FC 
Fallback Functions FC Comparison Operations FC 
Modifiers FC Assignment FC 

Function Calls Look Up FC 
Internal Function Calls FC New Expression FC 
External Function Calls FC Other Expressions FC 

Using For FC Inheritance FC 

Event FC 


FC: Fully Covered and Consistent with Solidity IDE N: Not Covered 


semantics. The Solidity semantics developed with the proposed framework is 
publicly available at https://github.com/kframework/solidity-semantics. 

The generated Solidity semantics is evaluated from two perspectives: the 
first one is its coverage (i.e., completeness), and the second is its correctness 
(i.e., consistency with Solidity compilers). Evaluation results show that the So- 
lidity semantics developed with the proposed framework completely covers the 
supported high-level core language features specified by the official Solidity doc- 
umentation [7] and is consistent with the official Solidity compiler Remix [5]. 

We evaluate and test the Solidity semantics developed with the proposed 
framework with the Solidity compiler test set [6]. This test set is regarded as 
a standard test set or benchmarks for evaluating Solidity semantics since the 
test programs are written in a standard or correct way defined by the language 
developers and cover all the features in Solidity. There are altogether 482 tests 
in the Solidity compiler test set. The evaluation is done by manually comparing 
the execution behaviours of the generated Solidity semantics with the ones of 
the Remix compiler on the test programs. We consider the generated Solidity 
semantics is correct if the execution behaviours indicated in the configuration 
are consistent with the ones of the Remix compiler. A feature is considered to 
be fully covered if all the compiler tests involving this feature are passed. We list 
the coverage of the generated Solidity semantics in Table 1 from the perspective 
of each feature specified by the official documentation. 

From Table 1, we can observe that the generated Solidity semantics com- 
pletely covers the supported high-level core features of Solidity. As for types, the 
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generated Solidity semantics covers the following elementary types: address, 
bool, string, Int, Uint and Byte. Fixed and Ufixed are not covered because 
they are not fully supported by Solidity yet [7]. User-defined types, including 
struct, contract types and enum, are covered. Mappings, arrays, function types 
and address payable are also covered. In addition, the semantics associated 
with functions, such as function definitions and function calls, is fully cov- 
ered. The semantics of statements is completely covered except that of inline 
assembly statements which are considered to be low-level features accessing 
EVM (i.e., this part of semantics can be integrated with KEVM [24]). All kinds 
of expressions in Solidity are covered. Lastly, the semantics of event is also cov- 
ered and the parts of semantics for using for and inheritance are covered 
with rewriting. For all the parts of covered semantics, they are considered to be 
correct since the execution behaviours involved are consistent with the ones of 
Remix. Therefore, the generated Solidity semantics can be considered to be com- 
plete and correct in terms of the supported high-level core features of Solidity, 
indicating the completeness and correctness of the general semantic model. 

Threats to Validity. We validate the general semantic model with its in- 
terpretation in Solidity. The validity of the proposed framework holds for any 
particular high-level smart contract programming language as long as its core 
features fall into or can be properly rewritten to the ones defined in the gen- 
eral semantic model. The proposed framework may not work if the core features 
cannot be interpreted with the ones defined in the general semantic model. How- 
ever, this is unlikely due to the nature of smart contract executions. For instance, 
transactions in existing instances are implemented with or can be transformed 
into function calls regardless of the platforms of smart contract programs. 


6 Conclusion 


In this paper, we propose a generalized formal semantic framework for smart con- 
tracts. This framework can directly handle smart contracts written in different 
high-level programming languages, such as Solidity, Vyper, Bamboo, etc, without 
translating them into EVM bytecode or intermediate languages. In this frame- 
work, a direct executable formal semantics of a particular high-level smart con- 
tract programming language is constructed based on a general semantic model 
with rewriting logic. The general semantic model is validated with its interpre- 
tation in Solidity and evaluation results show that it is complete and correct. 
Furthermore, the proposed framework provides a formal specification of smart 
contracts written in different languages. 
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Abstract. Streaming APIs allow for big data processing of native data 
structures by providing MapReduce-like operations over these structures. 
However, unlike traditional big data systems, these data structures typi- 
cally reside in shared memory accessed by multiple cores. Although popu- 
lar, this emerging hybrid paradigm opens the door to possibly detrimental 
behavior, such as thread contention and bugs related to non-execution 
and non-determinism. This study explores the use and misuse of a popular 
streaming API, namely, Java 8 Streams. The focus is on how developers 
decide whether or not to run these operations sequentially or in parallel 
and bugs both specific and tangential to this paradigm. Our study in- 
volved analyzing 34 Java projects and 5.53 million lines of code, along 
with 719 manually examined code patches. Various automated, including 
interprocedural static analysis, and manual methodologies were employed. 
The results indicate that streams are pervasive, parallelization is not 
widely used, and performance is a crosscutting concern that accounted 
for the majority of fixes. We also present coincidences that both confirm 
and contradict the results of related studies. The study advances our 
understanding of streams, as well as benefits practitioners, programming 
language and API designers, tool developers, and educators alike. 


Keywords: empirical studies - functional programming - Java 8 - streams 
- multi-paradigm programming - static analysis. 


1 Introduction 


Streaming APIs are widely-available in today’s mainstream Object-Oriented 
programming (MOOP) languages and platforms [5], including Scala [14], Java- 
Script [44], C# [33], F# [47], Java [39], and Android [27]. These APIs allow for 
“big data”-style processing of native data structures by incorporating MapReduce- 
like [10] operations. A “sum of even squares” example in Java, where a stream of 
numbers is derived from a list, filtered for evens, mapped to its squared, and 
summed [5] is: list.stream().filter(x -> x % 2 == 0).map(x -> x * x).sum(). 


© The Author(s) 2020 
H. Wehrheim and J. Cabot (Eds.): FASE 2020, LNCS 12076, pp. 97-118, 2020. 
https: //doi.org/10.1007/978-3-030-45234-6_5 


98 R. Khatchadourian et al. 


Traditional big data systems, for which MapReduce is a popular backbone [3], 
minimize the complexity of writing massively distributed programs by facilitating 
processing on multiple nodes using succinct functional-like constructs. This makes 
writing parallel code easier, as writing such code can be difficult due to possible 
data races, thread interference, and contention [1,4,28]. The code above, e.g., can 
execute in parallel simply by replacing stream() with parallelStream(). 

However, unlike traditional big data systems, data structures processed by 
streaming APIs like Java 8 Streams typically reside in shared memory accessed 
by multiple cores. Therefore, issues may arise from the close intimacy between 
shared memory and the operations being performed, especially for developers not 
previously familiar with functional programming. Streams are not just an API 
but rather an emerging, hybrid paradigm. To obtain the expressiveness, speed, 
and parallelism that streams have to offer, developers must adopt the paradigm 
as well as the API [6, Ch. 7]. This requires determining whether running stream 
code in parallel yields an efficient yet interference-free program [24] and ensuring 
that no operations on different threads interleave [42]. 

Despite the benefits [53, Ch. 1], misusing streams may result in detrimental 
behavior, and the ~4K questions related to streams on Stack Overflow [48], of 
which ~5% remain unanswered, suggest that there is ample confusion surrounding 
the topic. Bugs related to thread contention (due to \-expressions, i.e., units of 
computation, side-effects, buffering), non-execution (due to deferred execution), 
non-determinism (due to non-deterministic operations), operation sequencing 
(ordering of stream operations), and data ordering (ordering of stream data) can 
lead to programs that undermine concurrency, underperform, are incorrect, and 
are inefficient. Worse yet, these problems may increase over time as streams rise 
in popularity, with Mazinanian et al. [32] finding a two-fold increasing trend in 
the adoption of A-expressions, an essential part of streams. 

This study explores the use and misuse of a popular and representative 
streaming API, namely, Java 8 Streams. We set out to understand the usage and 
bug patterns involving streams in real software. Particularly, we are interested 
in discovering (i) how developers decide whether to run streams sequentially or 
in parallel, (ii) common stream operations, (iii) common stream attributes and 
whether they are amenable to safe and efficient parallelization, (iv) bugs both 
specific and tangential to streams, (v) how often incorrect stream APIs were 
used, and (vi) how often stream APIs were misused and in which ways? 

Knowing the kinds of bugs typically associated with streams can, e.g., help 
improve (automated) bug detection. Being aware of the typical usage patterns 
of streams can, e.g., improve code completion in IDEs. In general, the results 
(i) advance our understanding of this emerging hybrid paradigm, (ii) provide 
feedback to language and API designers for future API versions, (iii) help tool 
designers comprehend the struggles developers have with streams, (iv) propose 
preliminary best practices and anti-patterns for practitioners to use streaming 
APIs effectively, (v) and assist educators in teaching streaming APIs. 

We analyzed 34 Java projects and 5.53 million lines of source code (SLOC), 
along with 140,446 code patches (git commits), of which 719 were manually 
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Listing 1 Snippet of Widget collection processing using Java 8 streams [24,39]. 


1 Collection<Widget> unorderedWidgets = new HashSet<>() ; populate 

2  Collection<Widget> orderedWidgets = new ArrayList<>(); // populate 

3  List<Widget> sortedWidgets = unorderedWidgets.stream() 

4 BERR Comparator. comparing (Widget: ger Waient)); collect (Collectors.toList()); 
5 J t 


into a 


6 Sat <Doibies hear yidgetie Lentet = pac eee Ace parallelStream() .map(Widget: : getWeight) 
7 .filter(w -> w > 43.2) .collect (Collectors. ToGetO 

8 quentrally c Unt a list, skipping f t 

9 List<Widget> ESE ANEA = orderedWidgets. cient skip(1000) 

10 „collect @ollleetors. Colalst) ; 


examined. The methodologies varied depending on the research questions and 

encompassed both automated, including interprocedural static analysis, and 

manual processes aided by automated software repository mining. Our study 

indicates that (i) streams have become widely used since their inception in 2014, 

(ii) developers tend to reduce streams back to iterative-style collections, favor 

simplistic, linear reductions, and prefer deterministic operations, (iii) stream 

parallelization is not widely used, yet streams tend not to have side-effects, 

(iv) performance is the largest category of stream bugs and is crosscutting. 
This work makes the following contributions: 

Stream usages patterns A large-scale analysis of stream and collector method 
calls and an interprocedural static analysis on 1.65 million lines source code 
is performed, reporting on attributes essential to efficient parallel execution. 

Stream bug hierarchical taxonomy From the 719 git patches from 22 projects 
manually examined using 140 identifying keywords, we build a rich hierarchi- 
cal, crosscutting taxonomy of common stream bugs and fixes. 

Best practices and anti-patterns We propose preliminary best practices and 
anti-patterns of using streams in particular contexts from our statistical results 
as well as an in-depth analysis of first-hand conversations with developers. 


2 Motivating Example and Conceptual Background 


Lst. 1 portrays code that uses the Java 8 Stream API to process collections 
of Widgets (class not shown) with colors and weights. A Collection of Widgets 
is declared (line 1) that does not maintain element ordering as HashSet does not 
support it [38]. Note that ordering is dependent on the run time type. 

A stream (a view representing element sequences supporting MapReduce-style 
operations) of unorderedWidgets is created on line 3. It is sequential, i.e., its 
operations will execute serially. Streams may also have an encounter order that 
may depend on its source. Here, it is unordered since HashSets are unordered. 

On line 4, the stream is sorted by the corresponding intermediate oper- 
ation, the result of which is a stream with the encounter order rearranged. 
Widget: :getWeight is a method reference denoting the comparison scheme. In- 
termediate operations are deferred until a terminal operation is executed like 
collect() (line 4). The collect() operation is a (mutable) reduction that aggre- 
gates results of prior intermediate operations into a given Collector. In this case, 
it is one that yields a List. The result is a Widget List sorted by weight. 
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To potentially improve performance, this stream’s “pipeline” (sequence of 
operations) may be executed in parallel. Note, however, that had the stream 
been ordered, running the pipeline in parallel may result in worse performance 
due to the multiple passes or data buffering required by stateful intermediate 
operations (SIOs) like sorted(). Because the stream is unordered, the reduction 
can be done more efficiently as the run time can use divide-and-conquer [39]. 

In contrast, line 2 instantiates an ArrayList, which maintains element ordering. 
Furthermore, a parallel stream is derived from this collection (line 6), with each 
Widget mapped to its weight, each weighted filtered (line 7), and the results 
collected into a Set. Unlike the previous example, however, no optimizations are 
available here as an SIO is not included in the pipeline and, as such, the parallel 
computation does not incur possible performance degradation. 

Lines 9-10 create a list of Widgets gathered by (sequentially) skipping the first 
thousand from orderedWidgets. Like sorted(), skip( is also an SIO. Unlike the 
previous example, executing this pipeline in parallel could be counterproductive 
because the stream is ordered. It may be possible to unorder the stream (via 
unordered()) so that its pipeline would be more amenable to parallelization. In 
this situation, however, unordering could alter semantics as the data is assembled 
into a structure maintaining ordering. As such, the stream correctly executes 
sequentially as element ordering must be preserved. 

This simplified example demonstrates that using streams effectively is not 
always straight-forward and can require complex (and interprocedural due to 
aliasing) analysis. It necessitates a thorough understanding of API intricacies, a 
problem that can be compounded in more extensive programs. As streaming APIs 
become more pervasive, it would be extremely valuable to MOOP developers not 
familiar with functional programming if statistical insight can be given on how 
best to use streams efficiently and how to avoid common bugs. 


3 Study Subjects 


At the core of our study is 34 open source Java projects that use streams. They 
vary widely in their domain and application, as well as size and popularity. 
All the subjects have their sources publicly available on GitHub and include 
popular libraries, frameworks, and applications. Many subjects were selected from 
previous studies [20,21,22,24], others because they contained relatively diverse 
stream operations and exhibited non-trivial metrics, including stars, forks, and 
number of collaborators. It was necessary to use different subjects for different 
parts of the study due to the computationally intensive nature of some of the 
experiments. For such experiments, subjects were chosen so that the analysis 
could be completed in a reasonable time period with reasonable resources. 


4 Stream Characteristics 


We explore the typical usage patterns of streams, including the frequency of 
parallel vs. sequential streams and amenability to safe and efficient parallelism, 
by examining stream characteristics. This has important implications for under- 
standing the use of this incredibly expressive and powerful language feature. It 
also offers insight into developers’ perceived risks concerning parallel streams. 
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Table 1. Stream characteristics. 


subject KLOC age eps k str seq para ord unord se SIO 
bootique 4.91 4.18 362 414 14 0 11 3 4 0 
cryptomator 7.99 6.05 148 3 12 12 0 uH I 2 0 
dari 64.86 5.43 3 2 18 18 (0) 15 3 0 0 
elasticsearch 585.71 10.03 78 6 210 210 0 165 45 10 0 
htm.java 41.14 4.53 21 4 190 188 2 189 1 22 5 
JabRef 138.83 16.36 3,064 2301 290 11 239 62 9 0 
JacpFX 23.79 4.71 195 412 12 0 9 3 1 0 
jdp" 19.96 5.53 25 4 38 38 0 35 3 111 
jdk8-exp 3.43 6.35 34 449 49 0 47 2 5 0 
jetty 354.48 10.93 106 4 57 57 0 47 10 8 0 
JetUML 20.95 5.09 660 27 7 (0) 4 3 0 0 
jOOQ 154.01 8.58 43 A 23 23 0 22 1 2 0 
koral 7.13 3.47 51 38 8 0 8 0 0 0 
monads 1.01 0.01 47 23 3 0 3 0 0 0 
retrolambda 5.14 6.52 1 411 11 0) B 3 0 0 
spring” 188.46 11.62 5,981 461 61 0 601 210 
streamql] 4.01 0.01 92 2 22 22 0 22 0 2 18 
threeten* 27.53 7.01 36 22 2 0 2 0 0 0 


Total 1,653.35 116.40 11,047 6 1,038 1,025 13 897141 97 24 


* jdp is java-design-patterns, jdk8-exp is jdk8-experiments, spring is a 
portion of spring-framework, and threeten is threeten-extra. 


4.1 Methodology 


For this part of the study, we examined 18 projects that use streams,” spanning 
~1.65 million lines of Java source code. The subjects are depicted in tab. 1. 
Column KLOC corresponds to thousands of source lines of code, which ranges 
from ~1K for monads to ~586K for elasticsearch. Column age is the age of the 
subject project in years, averaging 6.47 years per subject. Column str is the 
total number of streams analyzed. The remaining columns are discussed in § 4.2. 


Stream Pipeline Tracking Several factors contribute to determining stream 
attributes. First, streams are typically derived from a source (e.g., a collection) 
and take on its characteristics (e.g., ordering), as seen in lst. 1. There are 
several ways to create streams, including being derived from Collections, being 
created from arrays (e.g., Arrays.stream()), and via static factory methods (e.g., 
IntStream.range()). Second, stream attributes can change by the invocation of 
various intermediate operations in the building of the stream pipeline. Such 
attributes must be tracked, as it is possible to have arbitrary assignments of 
stream references to variables, as well as be data-dependent. 

Our study involved tracking streams and their attributes (i.e., state) using 
a series of labeled transition systems (LTSs). The LTSs are fed into the static 
analysis portion of a refactoring tool [23] based on typestate analysis [16,49]. 
Stream pipelines are tracked and stream state when a terminal operation is issued 
is determined by the tool. Typestate analysis is a program analysis that augments 
the type system with “state” information and has been traditionally used for 
prevention of program errors such as those related to resource usage. It works 
by assigning each variable an initial (L) state. Then, method calls transition 
the object’s state. States are represented by a lattice and possible transitions 


5 Recall from § 3 that it was necessary to use different subjects for different parts of 
the study due to the computationally intensive nature of some of the experiments. 
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are represented by LTSs. If each method call sequence on the receiver does not 
eventually transition the object back to the L state, the object may be left in a 
nonsensical state, indicating the potential presence of a bug. 

The LTSs for execution mode and ordering work as follows. The state L is 
a phantom initial state immediately before stream creation. Different stream 
creation methods may transition the newly created stream to one that is either 
sequential or parallel or ordered or unordered. The transition continues for each 
invoked intermediate operation and ends with a terminal operation. 

Since the analysis is focused on client-side analysis of stream APIs, the call 
graph is constructed using a k-CFA, where k is the call string length. It is an 
analysis parameter, with k = 2 being the default, as it is the minimum k needed 
to consider client-code, for methods returning streams and k = 1 elsewhere. The 
refactoring tool includes heuristics for determining sufficient and tractable k. 


Counting Streams Since stream attributes are control flow sensitive, the 
streams studied must be in the control flow of entry points. For non-library 
subjects, all main methods were chosen, otherwise, all unit tests were chosen. 

Streams are counted as follows. First, every syntactic stream is counted, i.e., 
every allocation site. Streams in the control flow of the program starting from an 
entry point transition according to the LTSs. If a stream is not in the control 
flow, it is still counted but it remains at the state following L. This way, more 
information about various stream attributes is available for the study as we do 
not need control flow to determine the state following L. 


Side-effects and Stateful Intermediate Operations Stream side-effects 
are determined using a ModRef analysis on stream operation parameters (A- 
expressions) using WALA [52]. SIOs are obtained from the documentation [39]. 


4.2 Results 


Tab. 1 illustrates our findings on stream characteristics. Column eps is the 
number of entry points. Column k is the maximum k value used (see § 4.1). 
Columns seq and para correspond to the number of sequential and parallel 
streams, respectively. Column ord is the number of streams that are ordered, 
i.e., those whose operations must maintain an encounter order, which can be 
detrimental to efficient parallel performance, while column unord is the number 
unordered streams. Column se is the number of stream pipelines that include side- 
effects, which may induce race conditions. Finally, column SIO is the number 
of pipelines that include stateful intermediate operations, which may also be 
detrimental to efficient parallel performance. 


4.3 Discussion 


Parallel streams are not popular (1.25%) despite their ease-of-use. Although Niele- 
bock et al. [36] did not consider -expressions in stream contexts, this confirms 
that their findings extend into stream contexts. It may also coincide with the 
finding of Lu et al. [28], i.e., that developers tend to “think” sequentially. 
Finding 1: Stream parallelization is not widely used. 
When considering using parallel streams, it may also be important to consider 
the context. For example, many server applications deal with thread pools that 
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span the JVM, and developers may be leery of the interactions of such pools 
with the underlying stream parallelization run time system. We found this to be 
the case with several pull requests [15,45] that were issued by Khatchadourian 
et al. [24] as part of their refactoring evaluation to introduce parallel streams into 
existing projects. It may also be the case that the locations where streams operate 
are already fast enough or do not process significant amounts of data [7,30]. In 
fact, Naftalin [35, Ch. 6] found that there is a particular threshold in data size 
that must be reached to compensate for overhead incurred by parallel stream 
processing. Lastly, developers pointed us to several blog articles [54,59] expressing 
that parallel streams could be problematic under certain conditions. 

There were, however, two projects that use parallel streams. Particularly, 
JabRef used the most parallel streams at 11. We conjecture that JabRef’s use 
of parallel streams may stem from its status as a desktop application. Such 
applications typically are not managed by application containers and thus may 
not utilize global thread pools as in more traditional server applications. 

Many streams are ordered (86.42%), which can prevent optimal performance 
of parallel streams under certain conditions [24,35,40]. Thus, even if streams were 
run in parallel, they may not reap all of the benefits. This extends the findings 
of Nielebock et al. [36] that A-expressions do not appear in contexts amenable to 
parallelization to streams for the case of ordering. Streams may still be amenable 
to parallelization, as § 5.2 shows that many streams are traversed using API that 
ignores ordering (e.g., forEach() vs. forEachOrdered() ). 

Finding 2: Streams are largely ordered, possibly hindering parallelism. 

That only ~10% of streams have side-effects and only 2.31% have SIOs 
contradict the findings of Nielebock et al. [36] in the context of streams. This 
suggests that streams may run efficiently in parallel as, although they are largely 
ordered, they include minimal side-effects and SIOs. streamql had the most 
streams with SIOs (18/22), which may be due to its querying features using 
aggregate operations that are manifested as SIOs in the Java 8 Streaming API 
e.g., distinct ()). 

Finding 3: Streams tend not to have side-effects. 


5 Stream Usage 


We discover the common operations on streams and the underlying reasons by 
examining stream method calls. This has important implications in understanding 
how streams are used, and studying language feature usage has been shown to 
be beneficial [11,43]. It provides valuable insight to programming language API 
designers and tool-support engineers on where to focus their evaluation efforts. 
We may also comprehend contexts where developers struggle with using streams. 


5.1 Methodology 


We examined 34 projects that use streams, spanning ~5.53 million lines of source 
code. To find method calls, we parsed ASTs with source-symbol bindings using 
the Eclipse Java Developer Tools (JDT) [12]. Then, method invocation nodes 
were extracted whose compile-time targets are declared in types residing in the 
java.util.stream package. This includes types such as Streams and Collectors. 
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While stream creation is interesting and a topic for future work, our focus is 
on operations on streams as our scope is stream usage. We also combined methods 
with similar functionalities, e.g., mapToLong() with map() but not forEach() and 
forEachOrdered(). Additionally, only the method name is presented, resulting in 
a comparison of methods from both streams and collectors. The type is clear from 
the method name (e.g., map() is for Streams, while groupingBy() is for Collectors). 
We then proceeded to count the number of method calls in each project. 


5.2 Results 


Fig. 1 depicts the result of our analysis.° A full table is available in our dataset [25]. 
The horizontal axis lists the method name, and the legend depicts projects 
analyzed. The chart is sorted by the total number of calls in descending order. 
Calls per project range from 4 for threeten-extra to 4,635 for cyclops. Calls per 
method range from 2 for characteristics(), which returns stream attributes 
such as whether it is ordered or parallel, and 3,161 for toList(). 


5.3 Discussion 


The number of method calls in fig. 1 is substantial. There are 14,536 calls to 
methods operating on streams in 34 projects. This is impressive considering that 
Android, which uses the Java syntax, did not adopt streams immediately. 

It is not surprising that the four most used stream methods are toList(), 
collect(), map(), and filter(), as these are the core MapReduce data transforma- 
tion operations. collect() is a specialized reduction that reduces to a non-scalar 
type (e.g., a map) as opposed to the traditional scalar type. The toList () method 
is a static method of Collectors, which are pre-made reductions, in this case, to 
an ArrayList. This informs the collect() operation of the non-scalar type to use. 
It is peculiar that there are more calls to toList() than collect(). This is due 
to cyclops. We conjecture that it has some unorthodox usages of Collectors as it 
is a platform for writing functional-style programs in Java > 8 [2]. 

That collect() and toList(), along with other terminal operations such 
as forEach(), iterator(), toSet(), and toArray(), appear towards the top to 
the list suggest that, although developers are writing functional-style code to 
process data in a “big data” processing style, they are not staying there. Instead, 
they are “bridging” back to imperative-style code, either by collecting data into 
imperative-style collections or processing the data further iteratively. 

There can be various reasons for this, such as unfamiliarity with functional 
programming, the need to introduce side-effects, or the need to interoperate with 
legacy code. Further investigation is necessary, yet, Nielebock et al. [36] mention 
that developers tend to introduce side-effects into A-expressions, which is related. 

Finding 4: Although stream usage is high, developers tend to reduce 
streams back to iterative-style collections. 


We infer that developers tend to favor more simplistic (linear) rather than more 
specialized (higher-dimensionality) non-scalar reductions. It is surprising that 
more of the advanced reductions, such as those that return maps (e.g., toMap(), 


6 Similar conclusions hold when normalizing with subject KLOC. 
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groupingBy()) are not used more frequently as these are highly expressive opera- 

tions that can save substantial amounts of imperative-style code. For example, one 

may group Widgets by their Color as Map<Color, List<Widget>> widgetsByColor = 

widgets.stream() .collect (Collectors. groupingBy (Widget: :getColor)). Although 
these advanced reductions are powerful and expressive, developers may be leery 

of using them, perhaps due to unfamiliarity or risk adversement. This motivates 
future tools that refactor to uses of advanced reductions to save developers time 
and effort while possibly mitigating errors. 

Finding 5: Developers favor simplistic, linear reductions. 

Another powerful stream feature is its non-determinism. For instance, findAny () 
returns any stream element. However, this operation has only 62 calls, while its 
deterministic counterpart, findFirst(), has 270, suggesting that developers tend 
to favor determinism. Yet, in contrast, developers overwhelmingly favor the non- 
deterministic forEach() operation (552) over the deterministic forEachOrdered() 
(32). We conjecture that although forEach() does not guarantee a particular 
ordering [41], in practice, since developers are inclined to use sequential over 
parallel streams, as suggested by § 4 and mirrored by Nielebock et al. [36] in 
terms of \-expressions, the difference does not play out. 

It could also be that traversal order is largely unimportant for many streams. 
This is curious because, as demonstrated in § 4, the majority of streams are 
ordered, an attribute detrimental to efficient parallelism [24,35,40]. As such, there 
may exist opportunities to alleviate the burden of stream ordering maintenance 
to make parallel streams more efficient. It may also entice developers to use more 

arallel streams as the performance gains may be significant. 

Finding 6: Developers prefer deterministic operations. 

Lastly, there is a minimal amount of calls to parallel stream APIs. Of particular 
concern is that there are only 4 calls to groupingByConcurrent() in contrast to 
the 87 calls to groupingBy(). This suggests that either advanced reductions to 
maps are not being used on parallel streams or that they are not used safely 
as the concurrent version provides synchronization [37]. Furthermore, not using 
groupingByConcurrent() on a parallel stream may produce inefficient results [40]. 


6 Stream Misuses 


This section is focused on discovering stream bug patterns. We are interested 
in bugs both specific and tangential to streams, i.e., bugs that occur in stream 
contexts. Understanding this can, e.g., help improve (automated) bug detection 
and other tool-support for writing optimal stream code. We may also begin to 
understand the kinds of errors developers make with streams, which may positively 
influence how future API and language feature versions are implemented. 


6.1 Methodology 
Here, we explore 22 projects that use streams, comprising ~4.68 million lines 


of source code and 140,446 git commits.’ Tab. 2 summarizes the subjects used. 


T Recall from § 3 that it was necessary to use different subjects for different parts of 
the study due to the computationally intensive nature of some of the experiments. 
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Table 2. Studied subjects. 


subject KLOC studied periods cmts kws exe 


binnavi 328.28 2015-08-19 to 2019-07-17 286 4 4 
blueocean-plugin 49.70 2016-01-23 to 2019-07-24 4,043 118 25 
bootique 15.47 2015-12-10 to 2019-08-08 1,106 5 5 
che 189.24 2016-02-11 to 2019-08-19 8,093 75 75 
cryptomator 9.83 2014-02-01 to 2019-08-08 1,443 50 10 
dari 72.46 2012-09-26 to 2018-03-02 2,466 18 6 
eclipse.jdt.core 1,527.89 2001-06-05 to 2019-08-07 24,085 234 106 
eclipse.jdt.ui 712.91 2001-05-02 to 2019-08-09 28,136 149 32 
error-prone 165.85 2011-09-14 to 2019-08-15 3,893 71 71 
guava 393.47 2009-06-18 to 2019-08-15 5,031 36 36 
htm.java 41.63 2014-08-09 to 2019-02-19 1,507 40 1 
JacpFX 24.06 2013-08-12 to 2018-04-27 365 37 14 
jdk8-experiments 3.47 2013-08-03 to 2018-03-10 8 r l 
java-design-patterns 33.52 2014-08-09 to 2019-07-31 2,192 37 12 
jetty 400.26 2009-03-16 to 2019-08-02 17,051 835 219 
jOOQ 184.25 2011-07-24 to 2019-07-31 7,508 94 4 
qbit 52.27 2014-08-25 to 2018-01-18 1,717 65 9 
retrolambda 5.10 2013-07-20 to 2018-11-30 522 17 4 
selenium 234.12 2004-11-03 to 2019-08-09 24,145 114 57 
streamql 4.26 2014-04-27 to 2014-04-29 27 2 2 
threeten-extra 31.26 2012-11-17 to 2019-07-14 559 28 2 
WALA 203.84 2006-11-22 to 2019-07-24 6,263 52 24 


Total 4,683.12 140,446 2,082 719 


To find changesets (patches) corresponding to stream fixes, we compiled 140 
keywords from the API documentation [39] that match stream operations and 
related method names from the java.util.stream package. We then randomly 
selected a subset of these commits whose changesets included these keywords 
and were likely to be bug fixes to manually examine. 


Commit Mining To discover commits that had changesets including stream 
API keywords, we used gitcproc [9], a tool for processing and classifying git 
commits, which has been used in previous work [17,50]. Due to the keyword-based 
search used, not all of the examined commits pertained to streams (e.g., “map’ 
has a broad range of applications outside of streams). To mitigate this, we focused 
more on keywords that were specific to stream contexts, e.g., “Collector.” Also 
to reduce false positives, we only considered commits after the Java 8 release 
date of March 18, 2014, which is when streams were introduced. 


3 


Finding Bug Fixes We used a feature of gitcproc that uses heuristics based 
on commit log messages to identify commits that are bug fixes. Natural language 
processing (NLP) is used to determine which commits fall in this category. This 
helps us to focus on the likely bug-fix commits for further manual examination. 
Next, the authors manually examine these commits to determine if the com- 
mits were indeed related to stream-related bugs. Three of the authors are software 
engineering and programming language professors with extensive expertise in 
streaming and parallel systems, concurrent systems, and empirical software engi- 
neering. The authors also have several years of industrial experience working as 
software engineers. As the authors did not always have expertise in the subject 
domains, only changes where a bug fix was extremely likely were marked as 
such. The authors also used commit comments and referenced bug databases to 
ascertain whether a change was a bug fix. This is a common practice [8,26,28]. 
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Table 3. Stream bug/patch category legend. 


name description acronym 
Bounds Incorrect /Missing Bounds Check BC 
Exceptions Incorrect /Missing Exception Handling EH 
Other Other change (e.g., syntax, refactoring) Other 
Perf Poor Performance PP 
Concur Concurrency Issue CI 
Stream Source Incorrect /Missing Stream Source SS 
Intermediate Operations Incorrect/Missing Intermediate Operations IO 
Data Ordering Incorrect Data Ordering DO 
Operation Sequencing Incorrect Operation Sequencing OS 
Filter Operations Incorrect/Missing Filter Operations FO 
Map Operations Incorrect /Missing Map Operations MO 
Terminal Operations Incorrect/Missing Terminal Operations TO 
Reduction Operations Incorrect Reduction Operations RO 
Collector Operations Incorrect /Missing Collector Operations co 
Incorrect Action Incorrect Action (e.g., A-expression) IA 


Classifying Bug Fixes Once bug fixes were identified, the authors studied the 
code changes to determine the category of bug fixes and whether the category 
relates to streams. Fortunately, we found that many commits reference bug 
reports or provide more details about the fix. Such information proved highly 
valuable in understanding the fixes. When in doubt, we also sent emails to 
developers for clarification purposes as git commits include email addresses. 


6.2 Results 


Quantitative Column kws of tab. 2 is the number of commits where occurrences 
of keywords were found and correspond to possible stream bug fixes. Column exe 
depicts the number of commits manually examined. From these 719 commits, we 
found 61 stream client code bug fixes. This is depicted in column total of tab. 4. 
Finding these bugs and understanding their relevance required a significant 
amount of manual labor that may not be feasible in more larger-scale, automated 
studies. Nevertheless, as streams become more popular (they were only introduced 
in 2014), we expect the usage and number of bugs related to streams to grow. 

From the manual changes, we devised a set of common problem categories. 
Fixes were then grouped into these categories as shown in fig. 2 and tab. 4. A 
category legend appears in tab. 3, where column name is the “short” name of 
the bug category and is used in fig. 2. Column description is the categories 
extended name and column acronym is used in tab. 4. 

Fig. 2 presents a hierarchical categorization of the 61 stream-related bug fixes. 
Bugs are represented by their category name (column name in tab. 3) and their 
bug counts. Categories with no count are abstract, i.e., those grouping categories. 

Bugs are separated into two top-level categories, namely, bugs specifically 
related to stream API usage (stream-specific) and those tangentially related, 
i.e., bugs appearing in stream contexts but not specifically having to do with 
streams (generic). Generic bugs were further categorized into related to exception 
handling (EH), bounds checking (BC), poor performance (PP), and “other.” 
Generic exception handling bugs (6) include those where, e.g., A-expressions 
passed to stream operations threw exceptions that were not handled properly. 
Generic bounds checking bugs (2) included those where A-expressions missed 
traversal boundary checks, and generic performance bugs (2) were those involving, 
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Fig. 2. Studied stream bugs and patches (hierarchical). 


e.g., local variables holding stream computation results. The “other” category (3) 
is aligned with a similar one used by Tian and Ray [50] and involved syntactic 
corrections, e.g., incorrect types, and refactorings. Generally, “other” bugs can 
either be stream-specific or generic. 

Stream-specific bugs are further divided into several categories corresponding 
to whether they involved intermediate operations (IO), terminal operations 
(TO), the stream source (SS), concurrency (CI), and performance and exception 
handling bugs specific to streams. IO-specific bugs (2) are related to intermediate 
operations other than filter operations (FO, 7) and map operations (MO, 6), e.g., 
distinct (). IO bugs are additionally partitioned into those involving incorrect 
operation sequencing (OS, 2), e.g., map() before filter(), data ordering (DO, 2), 
e.g., operating on a stream that should have been sorted, and performance bugs 
appearing in intermediate operations other than map() and filter() (1). 

Terminal operations are split into two categories, namely, reduction oper- 
ations (RO), e.g., collect(), reduce(), and side-effect producing operations, 
e.g., forEach(), iterator(). RO-specific bugs (3) were those related to scalar 
reductions, e.g., anyMatch(), allMatch(). RO-specific bugs related to collector 
operations (CO, 3), on the other hand, involve non-scalar reductions, e.g., a 
collector malfunction. RO-specific data ordering bugs (DO, 2) correspond to 
ordering of data related to scalar reductions, e.g., using findAny() instead of 
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Table 4. Studied stream bugs and patches (nonhierarchical). 


subject BC CI CO DO EH FO IA IO MO OS PP RO SS Other Total 
binnavi 1 1 
blueocean-plugin 1 1 
bootique 1 1 
che 1 ee 1 1 4 
cryptomator L 2 1 2 6 
dari 2 2 
eclipse.jdt.core 1 
eclipse.jdt.ui 1 1 
error-prone 2 1 1 3 1 1 2 1 12 
guava 1 
JacpFX 1 1 2 4 
jdp 1 

jetty 1 2 1 3 iv 
jOoOQ 1 

selenium 2 1 2 5 1 2 2 1 Al 17 
threeten-extra 1 

Total 2 1 3 4 7 722 6 2 9 3 3 10 61 


findFirst(). RO-specific incorrect actions (IA, 1) is where there is a problematic 
A in a scalar reduction, e.g., an incorrect predicate in noneMatch(). Side-effect 
producing operation bugs also include incorrect actions (IA, 1), e.g., a problematic 
A in forEach(). Such operations can also exhibit poor performance (PO, 1). 
Some bug categories are crosscutting, appearing under multiple categories. 
An example is performance. For this reason, tab. 4 portrays a nonhierarchical 
view of fig. 2, which is also broken down by subject, including a column for each 
bug category regardless of its parent category (acronyms correspond to tab. 3). 
Finding 7: Bugs, e.g., performance, crosscut concerns, affecting multiple 
categories, both specifically and tangentially, associated with streams. 
Performance issues dominate the functional (excluding “other” ) bugs depicted 
in tab. 4, making up the categories “Performance/API misuse” and “Performance,’ 
accounting for 14.75% (9/61) of the bugs found. While some of these fixes were 
more cleaning-based (e.g., superfluous operations), others affected central parts 
of the system and were found during performance regression testing [56]. 
Finding 8: Although streams feature performance improving parallelism, 
developers tend to struggle with using streams efficiently. 


2 


Despite widespread performance issues, concurrency issues (CI), on the other 
hand, were not prevalent (1.64%). The one concurrency bug was where a stream 
operation involved non-atomic variable access, which resulted in improper ini- 
tialization [34]. Given that such a variable is accessed in a stream operation, 
however, it does indicate a possible side-effect and a need to consider refactoring 
such accesses to remove side-effects. This would make streams more amenable to 
efficient parallelization and perhaps promote more usage of parallel streams. 

Finding 9: Concurrency issues were the least common streams bugs. How- 
ever, concurrent variable access can cause thread contention, motivating 
future refactoring approaches that may promote more parallel streams. 

The subjects selenium and error-prone had the most stream bugs with 27.87% 
and 19.67%, respectively. We hypothesize that this is due to the relatively large 
size of these projects, as well as their high usage of streams. Specifically, they fell 
into the top ten in terms of KLOC and stream method calls in tab. 2 and fig. 1, 
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respectively, with ~400 combined KLOC and 1,414 combined calls. Naturally, 
projects that use streams more are likely to have more bugs involving streams. 


Qualitative We highlight several of the most common bug categories with 
examples, summarize common fixes, and propose preliminary best practices (BP) 
and anti-patterns (AP). Due to space limitations, only a single example of each 
BP/AP is shown; a complete set is available in our dataset [25]. Although some 
APs may seem applicable beyond streams, e.g., avoiding superfluous operations, 
we conjecture that streams are more prone to such patterns, e.g., due to the ease 
in which operations can be chained and the deferred execution they offer. 


SS—PP Performance issues dominated the number of stream bugs found and also 
crosscut multiple categories. Consider the following performance regression [56]: 


Project: jetty 
Commit ID: 70311fe98787ffb8a74ad296c9dd2ba9ac431c9c 
Log: Issue #3681 
1 - List<HttpField> cookies = preserveCookies ? _fields.stream().filter(f -> 
25 f.getHeader() == HttpHeader.SET_COOKIE) .collect(Collectors.toList()) : null; 


3 + List<HttpField> cookies = preserveCookies?_fields.getFields (HttpHeader .SET_COOKIE) :null; 


The stream field is replaced with getFields(), which performs an iterative traver- 
sal, effectively replacing streams with iteration. The developer found that using 
iteration was faster than using streams [57] and wanted more “JIT-friendly” code. 
The developer further admitted that using streams can make code more easy to 
read but can also be associated with “allocation/complexity cost [55].” 
BP1: Use performance regression testing to verify that streams in critical 
code paths perform efficiently. 

In the following, a pair of superfluous operations are removed: 


Project: JacpFX 

Commit ID: 4f0d62d3a0987c47a4cbdf8e056bdf89713e6aac 
Log: fixed class scanning 
final Stream<String> componentIds = CommonUtil 
.getStringStreamFromArray (annotation.perspectives()) ; 

final Stream<Injectable> perspectiveHandlerList = 
= componentIds.parallel() .sequential() .map(this: :mapToInjectable) ; 
+ componentIds.map(this: :mapToInjectable) ; 


aR ON 


getStringStreaFromArray() returns a sequential stream, which is then converted 
to parallel and then to sequential. The superfluous operations are then removed. 
AP1: Avoid superfluous intermediate operations. 

Fix: Generally, fixes for performance problems varied widely. They ranged 
from replacing stream code with iterative code, as seen above, to removing 
operations, to changing the stream source representations. Depending on context, 
the bugs’ effect can be either innocuous and cause server performance degradation. 


SS TO-RO-+CO The stream API provides several ready-made Collectors 
for convenience. However, the API does not guarantee a specific non-scalar used 
during the reduction. On one hand, this is convenient as developers may not need 
a specific collection type; on the other hand, however, developers must be careful 
to ensure that the specific subclass returned by the API meets their needs. 

In the following, the developer does not realize, until an incorrect program 
output, that the Map returned by Collectors.toMap() does not support nulls: 
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Project: selenium 
Commit ID: 91eb004d230d8d78ec97180e66bec7055b16130f 
Log: Fix wrapping of maps with null values. Fixes #3380 
if (result instanceof Map) { 


1 

2 - return ((Map<String, Object>) result) .entrySet().stream() .collect (Collectors .toMap( 
3f e -> e.getKey(), e -> wrapResult (e.getValue()))); 

4 + return ((Map<String, Object>) result) .entrySet().stream().collect (HashMap: :new, 

5 + (m, e) -> m.put(e.getKey(), e.getValue()), Map: :putAl1) ; 


The ready-made collector (line 2) is replaced with a direct call to collect () with 
a particular Map implementation specified (line 4), i.e., HashMap. 
BP2: Use collectors only if client code is agnostic to particular container 
implementations. Otherwise, use the direct form of collect(). 
Fiz: Collector-related bugs are typically corrected by not using a Collector 
(as above), changing the Collector used, or altering the Collector arguments. 
They often adversely affect program behavior but are also caught by unit tests. 


SS—+IO In the ensuing commit, distinct() is called on a concatenated stream 
to ensure that no duplicates are created as a result of the concatenation: 


Project: selenium 
Commit ID: eb7d9bf9ceal9b8bc1759c4deleb495829489cbe 
Log: Fix tests failing because of ProtocolHandshake 
1- return Stream.concat(from0ss, fromW3c); 
2+ return Stream.concat(from0ss, fromW3c).distinct(); 


BP3: Ensure concatenated streams have distinct elements. 
Fiz: SS-10 bugs tend to be fixed by adding additional operations. 


SS—IO- Other Developers “bridged” back to an imperative-style performed an 
operation, then switched back to streams to continue a more functional-style: 


Project: jetty 

Commit ID: 91e9e7b76a08b776be21560d7ba20f9bfd943f04 
Log: Issue #984 Improve module listing 

- List<String> ordered = _modules.stream() 
- -map(m->{return m.getName() ;}).collect(Collectors.toList()); 
Collections.sort (ordered) ; 
- ordered.stream().map(n->{return get(n) ;}) .forEach(module-> 
+ _modules.stream().filter(m->...).sorted().forEach(module-> 


aR WON 
1 


Each module is mapped to its name and collected into a list. Then, ordered 
is sorted via a non-stream Collections API. Another stream is then derived 
from ordered to perform further operations. However, on line 5, the bridge to 
a collection and subsequent sort operation is removed, and the computation 
remains within the stream API. It is now more amenable to parallelization. 
AP2: Avoid “bridging” between stream API and legacy collection APIs. 
Using a long A-expression in a single map() operation may make stream code 
less “functional,” more difficult to read [29], and less amenable to parallelism. 
Consider the abbreviated commit below that returns the occupied drive letters 
on Windows systems by collecting the first uppercase character of the path: 


Project: cryptomator 
Commit ID: b691e374eb2dad0284e13927e7c3fclfdcecae9bf 
Log: fixes #74 
- return rootDirs.stream().map(path -> path.toString() .toUpperCase() 
= .charAt (0)).collect (toSet()); 
+ return rootDirs.stream() .map(Path: : toString) .map(CharUtils: :toChar) 
+ -map (Character: :toUpperCase) .collect (toSet()); 


Bone 


The A-expression has been replaced with method references, however, there 
are more subtle yet import changes. Firstly, as CharUtils.toChar() returns the 
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first character of a String, there is a small performance improvement as the 
entire string is no longer turned to uppercase but rather only the first character. 
Also, the new version is written in more of a functional-style by replacing the 
single \-expression passed to map() with multiple map() operations. How data is 
transformed in the pipeline is easily visible, and future data transformations can 
be easily integrated by simply adding operations. 

AP3: Avoid too many operations within a single map() operation. 


Fiz: “Other” non-type correcting fixes, e.g., refactorings, included introduc- 
ing streams, sometimes from formerly iterative code (3), replacing map() with 
mapToInt() [20], and dividing “larger” operations into smaller ones. 


7 Threats to Validity 


Subjects may not be representative. To mitigate this, subjects were chosen 
from diverse domains and sizes. They have also been used in previous studies 
(e.g., [20,22]). Although java-design-patterns is artificial, it is a reference im- 
plementation similar to that of JHotDraw, which has been studied extensively 
(e.g., [31]). Also, as streams are relatively new, we expect a larger selection of 
subjects as they grow in popularity. 

Entry points may not be correct, which could affect how stream attributes are 
calculated. Since standard entry points were chosen, these represent a superset 
of practically true entry points. Furthermore, there may be custom streams or 
collectors outside the standard API that we are not considered. As we aim to 
understand stream usage and misuse in the large, we hypothesize that the vast 
majority of projects using streams use ones from the standard libraries. 

Our study involved many hours of manual validation, which can be subject 
to bias. However, we investigated referenced bug reports and other comments 
from developers to help us understand changes more fully. We also reached out to 
several developers via email correspondence when in doubt. All but one returned 
the correspondence. The NLP features of gitcproc may have missed changesets 
that were indeed bug fixes. Nevertheless, we were still able to find 61 bugs 
that contributed to a rich bug categorization, best practices, and anti-patterns. 
Furthermore, gitcproc has been used previously in other studies. 


8 Related Work 


Previous studies [29,32,36,46,51] have focused specifically on A-expressions. While 
A-expressions are used as arguments to stream operations, our focus is on stream 
operations themselves. Such operations transition streams to different states, 
which can be detrimental to parallel performance [24,35]. Also, since streams 
can be aliased, we use a tool [24] based on typestate analysis to obtain stream 
attributes more reliably than AST-based approaches. We also study bugs related 
to stream usage and present developer feedback—fixing bugs related to streams 
may not involve changing A-expressions; bugs can be caused by, e.g., an incorrect 
sequence of stream operations. Lastly, although Nielebock et al. [36] consider 
A-expressions in “concurrency contexts,” such contexts do not include streams, 
where \-expressions can easily execute in parallel with minimal syntactical effort. 
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Khatchadourian et al. [24] report on some stream characteristics as part of 
their refactoring evaluation but do so on a much smaller-scale, as their focus 
was on the refactoring algorithm. The work presented here goes significantly 
above in beyond by reporting on a richer set of stream characteristics (e.g., 
execution mode, ordering), with a noteworthy larger and updated corpus. We 
also include a comprehensive categorization of stream-related bug fixes, with 719 
commits manually analyzed. Preliminary best practices and anti-patterns are 
also proposed. 

Zhou et al. [60] conduct an empirical study on 210 service quality issues of a 
big data platform at Microsoft to understand their common symptoms, causes, 
and mitigations. They identify hardware faults, systems, and customer side effects 
as major causes of quality issues. There are also empirical studies on data-parallel 
programs. Kavulya et al. [19] study failures in MapReduce programs. Jin et al. 
[18] study performance slowdowns caused by system side inefficiencies. Xiao 
et al. [58] conduct a study on commutativity, nondeterminism, and correctness 
of data-parallel programs, revealing that non-commutative reductions lead to 
bugs. Though related, our work specifically focuses on stream APIs as a language 
feature and programming paradigm, which pose special considerations due to its 
shared memory model, i.e., interactions between the operations and local memory. 
Bloch [6, Ch. 7] also puts-forth stream best practices and anti-patterns. However, 
ours are based on a statistical analysis of real-world software and first-hand 
interactions with real-world developers. 

Others also study language features. Parnin et al. [43] study the adoption of 
Java generics. Dyer et al. [11] build an expansive infrastructure for studying the 
use of language features over time. Khatchadourian and Masuhara [22] employ a 
proactive approach in empirically assessing new language features and present 
a case study on default methods. There are also many studies regarding bug 
analysis. For example, Engler et al. [13] present a general approach to inferring 
errors in systems code, and Tian and Ray [50] study error handling bugs in C. 


9 Conclusion & Future Work 


This study advances our understanding of stream usage and bug patterns. We 
have surveyed common stream operations, attributes, and bugs specific and 
tangentially related to streams. A hierarchical taxonomy of stream bugs was 
devised, preliminary best practices and anti-patterns were proposed, and first- 
hand developer interactions were detailed. In the future, we will explore stream 
creation, use our findings to devise automated error checkers, and explore topics 
that interest stream developers. Lastly, we will investigate applicability to other 
streaming frameworks and languages. 


Acknowledgments We would like to thank Krishna Desai and Robert Dyer 
for work on data summarization and discussions, respectively. Support for this 
project was provided by PSC-CUNY Award #617930049, jointly funded by The 
Professional Staff Congress and The City University of New York. This material 
is based upon work supported by the National Science Foundation under Grant 
No. CCF 1845893, CNS 1842456, and CCF 1822965. 


An Empirical Study on the Use and Misuse of Java 8 Streams 115 


References 


1. 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


Ahmed, S., and Bagherzadeh, M.: What Do Concurrency Developers Ask About?: A 
Large-scale Study Using Stack Overflow. In: International Symposium on Empirical 
Software Engineering and Measurement, 30:1-30:10 (2018). Dor: 10.1145/3239235. 
3239524 


. AOL: AOL/cyclops: An advanced, but easy to use, platform for writing functional 


applications in Java 8. (2019). http://git.io/fjxzF (visited on 08/29/2019) 


. Bagherzadeh, M., and Khatchadourian, R.: Going Big: A Large-scale Study on What 


Big Data Developers Ask. In: Joint Meeting on European Software Engineering 
Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 
2019, pp. 432-442. ACM, Tallinn, Estonia (2019). Dor: 10.1145/3338906.3338939 


. Bagherzadeh, M., and Rajan, H.: Order Types: Static Reasoning About Message 


Races in Asynchronous Message Passing Concurrency. In: International Workshop 
on Programming Based on Actors, Agents, and Decentralized Control, pp. 21-30 
(2017). Dot: 10.1145/3141834.3141837 


. Biboudis, A., Palladinos, N., Fourtounis, G., and Smaragdakis, Y.: Streams a la carte: 


Extensible Pipelines with Object Algebras. In: European Conference on Object- 
Oriented Programming, pp. 591-613 (2015). DOI: 10.4230/LIPIcs. ECOOP.2015.591 


. Bloch, J.: Effective Java. Prentice Hall, Upper Saddle River, NJ, USA (2018) 
. Bordet, S.: Pull Request #2837 e eclipse/jetty.project, Webtide. (2018). http: 


//git.io/JeBAF (visited on 10/20/2019) 


. Casalnuovo, C., Devanbu, P., Oliveira, A., Filkov, V., and Ray, B.: Assert Use in 


GitHub Projects. In: International Conference on Software Engineering. ICSE 715, 
pp. 755-766. IEEE Press, Florence, Italy (2015). http://dl.acm.org/citation.cfm? 
id=2818754.2818846 


. Casalnuovo, C., Suchak, Y., Ray, B., and Rubio-Gonzalez, C.: GitcProc: A Tool 


for Processing and Classifying GitHub Commits. In: International Symposium on 
Software Testing and Analysis. ISSTA 2017, pp. 396-399. ACM, Santa Barbara, 
CA, USA (2017). Dor: 10.1145/3092703.3098230 

Dean, J., and Ghemawat, S.: MapReduce: Simplified Data Processing on Large 
Clusters. Commun. ACM 51(1), 107-113 (2008). Dor: 10.1145/1327452.1327492 
Dyer, R., Rajan, H., Nguyen, H.A., and Nguyen, T.N.: Mining Billions of AST Nodes 
to Study Actual and Potential Usage of Java Language Features. In: International 
Conference on Software Engineering. ICSE 2014, pp. 779-790. ACM, Hyderabad, 
India (2014) 

Eclipse Foundation: Eclipse Java development tools (JDT), Eclipse Foundation. 
(2019). http://eclipse.org/jdt (visited on 10/19/2019) 

Engler, D., Chen, D.Y., Hallem, S., Chou, A., and Chelf, B.: Bugs As Deviant 
Behavior: A General Approach to Inferring Errors in Systems Code. In: Symposium 
on Operating Systems Principles. SOSP ’01, pp. 57-72. ACM, Banff, Alberta, 
Canada (2001). DOI: 10.1145/502034.502041 

EPFL: Collections-Mutable and Immutable Collections—Scala Documentation, 
(2017). http://scala-lang.org/api/2.12.3/scala/collection/index.html (visited on 
08/24/2018) 

Erdfelt, J.: Pull Request #2837 e eclipse/jetty.project, Eclipse Foundation. (2018). 
http://git.io/JeBAM (visited on 10/20/2019) 

Fink, S.J., Yahav, E., Dor, N., Ramalingam, G., and Geay, E.: Effective Typestate 
Verification in the Presence of Aliasing. ACM Transactions on Software Engineering 
and Methodology 17(2), 91-934 (2008). DOI: 10.1145/1348250.1348255 


116 


17. 


18. 


19. 


20. 


21. 


22: 


23. 


24. 


25. 


26. 


2. 


28. 


29. 


30. 


R. Khatchadourian et al. 


Gharbi, S., Mkaouer, M.W., Jenhani, I., and Messaoud, M.B.: On the Classification 
of Software Change Messages Using Multi-label Active Learning. In: Symposium 
on Applied Computing. SAC ’19, pp. 1760-1767. ACM, Limassol, Cyprus (2019). 
DOI: 10.1145/3297280.3297452 

Jin, H., Qiao, K., Sun, X.-H., and Li, Y.: Performance Under Failures of MapReduce 
Applications. In: International Symposium on Cluster, Cloud and Grid Computing. 
CCGRID ’11, pp. 608-609. IEEE Computer Society, Washington, DC, USA (2011). 
DOI: 10.1109/ccgrid.2011.84 

Kavulya, S., Tan, J., Gandhi, R., and Narasimhan, P.: An Analysis of Traces from a 
Production MapReduce Cluster. In: International Conference on Cluster, Cloud and 
Grid Computing. CCGrid 2010, pp. 94-103. IEEE, Melbourne, Australia (2010). 
DOI: 10.1109/CCGRID.2010.112 

Ketkar, A., Mesbah, A., Mazinanian, D., Dig, D., and Aftandilian, E.: Type 
Migration in Ultra-large-scale Codebases. In: International Conference on Software 
Engineering. ICSE ’19, pp. 1142-1153. IEEE Press, Montreal, Quebec, Canada 
(2019). pot: 10.1109/ICSE.2019.00117 

Khatchadourian, R., and Masuhara, H.: Automated Refactoring of Legacy Java 
Software to Default Methods. In: International Conference on Software Engineering, 
pp. 82-93 (2017). Dor: 10.1109/ICSE.2017.16 

Khatchadourian, R., and Masuhara, H.: Proactive Empirical Assessment of New 
Language Feature Adoption via Automated Refactoring: The Case of Java 8 Default 
Methods. In: International Conference on the Art, Science, and Engineering of 
Programming, 6:1-6:30 (2018). Dor: 10.22152/programming-journal.org/2018/2/6 
Khatchadourian, R., Tang, Y., Bagherzadeh, M., and Ahmed, S.: A Tool for 
Optimizing Java 8 Stream Software via Automated Refactoring. In: International 
Working Conference on Source Code Analysis and Manipulation, pp. 34-39 (2018). 
DOI: 10.1109/SCAM.2018.00011 

Khatchadourian, R., Tang, Y., Bagherzadeh, M., and Ahmed, S.: Safe Automated 
Refactoring for Intelligent Parallelization of Java 8 Streams. In: International 
Conference on Software Engineering. ICSE 719, pp. 619-630. IEEE Press (2019). 
DOI: 10.1109/ICSE.2019.00072 

Khatchadourian, R., Tang, Y., Bagherzadeh, M., and Ray, B.: An Empirical Study 
on the Use and Misuse of Java 8 Streams, (2020). DOI: 10.5281 /zenodo.3677449. 
Feb. 2020. 

Kochhar, P.S., and Lo, D.: Revisiting Assert Use in GitHub Projects. In: Interna- 
tional Conference on Evaluation and Assessment in Software Engineering. EASE’17, 
pp. 298-307. ACM, Karlskrona, Sweden (2017). DOI: 10.1145/3084226.3084259 
Lau, J.: Future of Java 8 Language Feature Support on Android. Android Developers 
Blog (2017). http: //android-developers.googleblog.com/2017/03/future-of-java-8- 
language-feature.html (visited on 08/24/2018) 

Lu, S., Park, S., Seo, E., and Zhou, Y.: Learning from Mistakes: A Comprehensive 
Study on Real World Concurrency Bug Characteristics. In: International Conference 
on Architectural Support for Programming Languages and Operating Systems, 
pp. 329-339. ACM (2008). DOI: 10.1145/1346281.1346323 

Lucas, W., Bonifacio, R., Canedo, E.D., Marcilio, D., and Lima, F.: Does the 
Introduction of Lambda Expressions Improve the Comprehension of Java Programs? 
In: Brazilian Symposium on Software Engineering. SBES 2019, pp. 187-196. ACM, 
Salvador, Brazil (2019). Dot: 10.1145/3350768.3350791 

Luontola, E.: Pull Request #140 e orfjackal/retrolambda, Nitor Creations. (2018). 
http://git.io/JeBAQ (visited on 10/20/2019) 


31. 


32. 


33. 


34. 


35. 


36. 


37. 


38. 


39. 


40. 


41. 


42. 


43. 


44. 


45. 


46. 


47. 


48. 


49. 


50. 


An Empirical Study on the Use and Misuse of Java 8 Streams 117 


Marin, M., Moonen, L., and Deursen, A. van: An Integrated Crosscutting Concern 
Migration Strategy and its Application to JHotDraw. In: International Working 
Conference on Source Code Analysis and Manipulation (2007) 

Mazinanian, D., Ketkar, A., Tsantalis, N., and Dig, D.: Understanding the Use of 
Lambda Expressions in Java. Proc. ACM Program. Lang. 1(OOPSLA), 85:1-85:31 
(2017). DOI: 10.1145/3133909 

Microsoft: LINQ: .NET Language Integrated Query, (2018). http://msdn.microsoft. 
com/en-us/library/bb308959.aspx (visited on 08/24/2018) 

Moncsek, A.: allow OnShow when Perspective is initialized, fixed issues with 
OnShow/OnHide in perspective èe JacpFX/JacpFX @f2d92f7, JacpFX. (2015). 
http://git.io/JeOX8 (visited on 10/24/2019) 

Naftalin, M.: Mastering Lambdas: Java Programming in a Multicore World. McGraw- 
Hill (2014) 

Nielebock, S., Heumiiller, R., and Ortmeier, F.: Programmers Do Not Favor Lambda 
Expressions for Concurrent Object-oriented Code. Empirical Softw. Engg. 24(1), 
103-138 (2019). Dot: 10.1007/s10664-018-9622-9 

Oracle: Collectors (Java Platform SE 10 & JDK 10)—groupingByConcurrent, (2018). 
http://docs.oracle.com/javase/10/docs/api/java/util/stream /Collectors. html 
groupingByConcurrent(java.util.function.Function) (visited on 08/29/2019) 
Oracle: HashSet (Java SE 9) & JDK 9, (2017). http://docs.oracle.com/javase/9/ 
docs/api/java/util/HashSet.html (visited on 04/07/2018) 

Oracle: java.util.stream (Java SE 9 & JDK 9), (2017). http://docs.oracle.com/ 
javase/9/docs/api/java/util/stream/package-summary.html (visited on 02/22/2020) 
Oracle: java.util.stream (Java SE 9 & JDK 9)—Parallelism, (2017). http://docs.oracle. 
com /javase /9/docs/api/java/util/stream /package- summary. html# Parallelism 
(visited on 02/22/2020) 

Oracle: Stream (Java Platform SE 10 & JDK 10)-forEach, (2018). http://docs. 
oracle.com/javase/10/docs/api/java/util/stream/Stream.html#forEach (java.util. 
function.Consumer) (visited on 08/29/2019) 

Oracle: Thread Interference, (2017). http: //docs.oracle.com / javase / tutorial / 
essential/concurrency /interfere.html (visited on 04/16/2018) 

Parnin, C., Bird, C., and Murphy-Hill, E.: Adoption and Use of Java Generics. 
Empirical Softw. Engg. 18(6), 1047-1089 (2013). Dor: 10.1007/s10664-012-9236-6 
Refsnes Data: JavaScript Array map() Method, (2015). http: //w3schools.com/ 
jsref/jsref_map.asp (visited on 02/22/2020) 

Rutledge, P.: Pull Request #1 e RutledgePaulV/monads, Vodori. (2018). http: 
//git.io/JeBAZ (visited on 10/20/2019) 

Sangle, S., and Muvva, S.: On the Use of Lambda Expressions in 760 Open Source 
Python Projects. In: Joint Meeting on European Software Engineering Conference 
and Symposium on the Foundations of Software Engineering. ESEC/FSE 2019, 
pp. 1232-1234. ACM, Tallinn, Estonia (2019). Dor: 10.1145/3338906.3342499 
Shilkov, M.: Introducing Stream Processing in F#, (2016). http://mikhail.io/2016/ 
11/introducing-stream-processing-in-fsharp (visited on 07/18/2018) 

Stack Overflow: Newest ’java-stream’ Questions, (2018). http://stackoverflow.com/ 
questions/tagged/java-stream (visited on 03/06/2018) 

Strom, R.E., and Yemini, S.: Typestate: A programming language concept for 
enhancing software reliability. IEEE Transactions on Software Engineering SE- 
12(1), 157-171 (1986). DOr: 10.1109/tse.1986.6312929 

Tian, Y., and Ray, B.: Automatically Diagnosing and Repairing Error Handling 
Bugs in C. In: Joint Meeting on European Software Engineering Conference and 


118 


5l. 


52. 
53. 
54. 


55. 
56. 


57. 


58. 


59. 


60. 


R. Khatchadourian et al. 


Symposium on the Foundations of Software Engineering. ESEC/FSE 2017, pp. 752- 
762. ACM, Paderborn, Germany (2017). DOI: 10.1145/3106237.3106300 

Uesbeck, P.M., Stefik, A., Hanenberg, S., Pedersen, J., and Daleiden, P.: An 
empirical study on the impact of C++ lambdas and programmer experience. In: 
International Conference on Software Engineering. ICSE ’16, pp. 760-771. ACM, 
Austin, Texas (2016). DOI: 10.1145/2884781.2884849 

WALA Team: T.J. Watson Libraries for Analysis, (2015). http://wala.sf.net (visited 
on 01/18/2017) 

Warburton, R.: Java 8 Lambdas: Pragmatic Functional Programming (2014) 
Weiss, T.: Java 8: Behind The Glitz and Glamour of The New Parallelism APIs. 
OverOps Blog (2014). http: //blog.overops.com /new- parallelism-apis- in-java-8- 
behind-the-glitz-and-glamour (visited on 10/20/2019) 

Wilkins, G.: Issue #3681 e eclipse/jetty.project@70311fe, Webtide, LLC. (2019) 
Wilkins, G.: Jetty 9.4.x 3681 http fields optimize by gregw e Pull Request 
#3682 e eclipse/jetty.project, Webtide, LLC. (2019). http://git.io/JeBAq (visited 
on 09/18/2019) 

Wilkins, G.: Jetty 9.4.x 3681 http fields optimize by gregw e Pull Request 
#3682 e eclipse/jetty.project. Comment, Webtide, LLC. (2019). http://git.io/ 
JeOMS (visited on 10/24/2019) 

Xiao, T., Zhang, J., Zhou, H., Guo, Z., McDirmid, S., Lin, W., Chen, W., and 
Zhou, L.: Nondeterminism in MapReduce Considered Harmful? An Empirical Study 
on Non-commutative Aggregators in MapReduce Programs. In: ICSE Companion, 
pp. 44-53 (2014). Dor: 10.1145/2591062.2591177 

Zhitnitsky, A.: How Java 8 Lambdas and Streams Can Make Your Code 5 Times 
Slower. OverOps Blog (2015). http: //blog.overops.com/benchmark- how- java-8- 
lambdas-and-streams-can-make-your-code-5-times-slower (visited on 10/20/2019) 
Zhou, H., Lou, J.-G., Zhang, H., Lin, H., Lin, H., and Qin, T.: An Empirical Study 
on Quality Issues of Production Big Data Platform. In: International Conference 
on Software Engineering. ICSE 2015, pp. 17-26. ACM, Florence, Italy (2015) 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http: //creativecommons.org/licenses/by/4.0/), 


which permits use, sharing, adaptation, distribution and reproduction in any medium or 


format, as long as you give appropriate credit to the original author(s) and the source, 


provide a link to the Creative Commons license and indicate if changes were made. 


The images or other third party material in this chapter are included in the chapter’s 


Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter’s Creative Commons license and your intended 


use is not permitted by statutory regulation or exceeds the permitted use, you will need 


to obtain permission directly from the copyright holder. 


s N) 
Check for 
updates 


Extracting Semantics from Question-Answering 
Services for Snippet Reuse 


Themistoklis Diamantopoulos®, Nikolaos Oikonomou®, and Andreas 
Symeonidis® 


Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki 
Thessaloniki, Greece 
thdiaman@issel.auth.gr, nikooiko@ece.auth.gr, asymeon@eng.auth.gr 


Abstract. Nowadays, software developers typically search online for 
reusable solutions to common programming problems. However, form- 
ing the question appropriately, and locating and integrating the best 
solution back to the code can be tricky and time consuming. As a re- 
sult, several mining systems have been proposed to aid developers in 
the task of locating reusable snippets and integrating them into their 
source code. Most of these systems, however, do not model the seman- 
tics of the snippets in the context of source code provided. In this work, 
we propose a snippet mining system, named StackSearch, that extracts 
semantic information from Stack Overlow posts and recommends use- 
ful and in-context snippets to the developer. Using a hybrid language 
model that combines Tf-Idf and fastText, our system effectively under- 
stands the meaning of the given query and retrieves semantically similar 
posts. Moreover, the results are accompanied with useful metadata using 
a named entity recognition technique. Upon evaluating our system in a 
set of common programming queries, in a dataset based on post links, 
and against a similar tool, we argue that our approach can be useful for 
recommending ready-to-use snippets to the developer. 


Keywords: Code Search - Snippet Mining - Code Semantic Analysis - 
Question-Answering Systems. 


1 Introduction 


Lately, the widespread use of the Internet and the introduction of the open- 
source development initiative have given rise to a new way of developing soft- 
ware. Developers nowadays rely more than ever on online services in order to 
solve common problems arising during development, including e.g. developing a 
component, integrating an API, or even fixing a bug. This new reuse paradigm 
has been greatly supported by search engines, code hosting facilities, program- 
ming forums, and question-answering communities, such as Stack Overflow!. 
One could even argue that software today is built using reusable components, 
which are found in software libraries and are exposed via APIs. 


1 https: //stackoverflow.com/ 
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As a result, the challenge lies in properly integrating these APIs/components 
in order to support the required functionality. This process is typically performed 
via snippets, i.e. small code fragments that usually perform clearly defined tasks 
(e.g. reading a CSV file, connecting to a database, etc.). Given the vastness of 
data in the services outlined in the previous paragraph (e.g. Stack Oveflow alone 
has more than 18 million question posts”), locating the most suitable snippet to 
perform a task and integrating it to one’s own source code can be hard. In this 
context, developers often have to leave the IDE, form a query in an online tool 
and navigate through several solutions before finding the most suitable one. 

To this end, several systems have been proposed. Some of these systems focus 
on the API usage mining problem [5,9,13,14,17,18,27,30] and extract examples 
for specific library APIs, while others offer more generic snippet mining solutions 
[3, 6, 28, 29] and further allow queries for common programming problems (e.g. 
how to read a file in Java). Both types of systems usually employ an indexing 
mechanism that allows developers to form a query and retrieve relevant snippets. 

These systems, however, have important limitations. First of all, several of 
them do not allow queries in natural language and may require the developer 
to spend time in order to form a query in some specialized format. Secondly, 
most systems index only information extracted from source code, without ac- 
counting for the semantics that can be extracted from comments or even from 
the surrounding text in the context (web location) that each snippet is found. 
Furthermore, most tools employ some type of lexical (term frequency) indexing, 
thus not exploiting the benefits of embeddings that can lead to semantic-aware 
retrieval. Finally, the format and the presentation of the results is most of the 
time far from optimal. There are systems that return call sequences as opposed 
to ready-to-use snippets, while, even when snippets are retrieved, they are some- 
times provided as-is without any additional information concerning their APIs. 

In this paper, we design and develop StackSearch, a system that receives 
queries in natural language and employs an indexing mechanism on Stack Over- 
flow data in order to retrieve useful snippets. The indexing mechanism takes ad- 
vantage of all possible information about a snippet by extracting semantics from 
both the textual (title, tags, body) and the source code part of Stack Overflow 
posts. The information is extracted using lexical matching as well as embeddings 
in order to produce a hybrid model and retrieve the most useful results, even 
when taking into account the possible ambiguities of natural language. Finally, 
the snippets retrieved by StackSearch are accompanied by relevant labels that 
provide an interpretation of the semantics of the posts and the employed APIs. 


2 Related Work 


As already mentioned, we focus on snippet mining systems that recommend 
solutions to typical programming problems. Some of the first systems proposed 
in this area were Prospector [16] and PARSEWeb [25]. These systems focus on 


? Source: https://data.stackexchange.com/ 
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recommending snippets that form a path between a source object to a target 
object. For Prospector, these paths are called jungloids and the program flow is a 
jungloid graph. Though interesting, the system has a local database, which limits 
its applicability. PARSEWeb, on the other hand, uses the Google search engine 
and produces better results in most scenarios [25]). However, both systems have 
important limitations; they require the developer to know which API calls to use 
and further receive queries in a specialized format, and not in natural language. 


Another popular category of systems in current research involves those fo- 
cusing on the challenge of API usage mining, such as MAPO [30], UP-Miner [27] 
or PAM [9]. The problem is typically defined as extracting common usage pat- 
terns from client source code, i.e. source code that uses the relevant API. To 
do so, MAPO employs frequent sequence mining, while UP-Miner uses graphs 
and mines frequent closed API call paths. PAM, on the other hand, employs 
probabilistic machine learning to extract sequences that exhibit higher coverage 
of the API under analysis and are more diverse [9]. Though quite effective, these 
systems are actually limited to the API under analysis and cannot support more 
generic queries. Furthermore, they too do not accept queries in natural language, 
while their output is in the form of sequences, instead of ready-to-use snippets. 


Similar conclusions can be drawn for API mining systems that output snip- 
pets. For example, APIMiner [17] performs code slicing in order to generate 
common API usage examples, while eXoaDocs [14] further performs semantic 
clustering (using the DECKARD code clone detection algorithm [11]) to group 
them according to their functionality. CLAMS [13] also clusters the snippets and 
further generates the most representative (medoid) snippet of each cluster us- 
ing slicing and code summarization techniques. Another interesting approach is 
MUSE [18], which employs a novel ranking scheme for the recommended snippets 
based on metrics such as the ease of reuse, a metric computed by determining 
whether a snippet has custom object types, and thus requires external depen- 
dencies. As with the previous approaches, these systems are effective for mining 
API usage examples, however they do not generalize to the problem of receiving 
natural language queries and retrieving API-agnostic reusable solutions. 


This more generic snippet mining scenario is supported by several contempo- 
rary systems. One such system is SnipMatch [29], which employs pattern-based 
code search to retrieve useful snippets. SnipMatch, however, relies on a local 
index that has to be updated from the developer. More advanced systems in 
this aspect usually connect to online search engines and process their results to 
extract and recommend snippets. For example, Blueprint [3] and CodeCatch [6] 
employ the Google search engine, while Bing Code Search [28] employs Bing. 
Due to the integration with strong engines, these systems tend to offer effective 
natural language understanding features and their results are adequate even in 
less common queries. However, the text surrounding the code is not parsed for se- 
mantic information, so the quality of the retrieved snippets is bound only to the 
semantics introduced by the search engines. Moreover, the agnostic web search 
that these systems perform may often be suboptimal compared to issuing the 
queries to a better focused question-answering service. 
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These limitations have led to more specialized tools that employ Stack Over- 
flow in order to recommend snippets that are proposed by the community and 
are accompanied by useful metadata. One of the first such systems is Example 
Overflow [35], an online code search engine that uses Tf-Idf as a scoring mecha- 
nism and retrieves snippets relevant to the jQuery framework. Two other systems 
in this area, which are built as plugins of the Eclipse IDE, are Prompter [22] and 
Seahawk [21]. Prompter employs a sophisticated ranking mechanism based not 
only on the code of each snippet, but also on metadata, such as the score of the 
post or reputation of the user that posted it on Stack Overflow. Seahawk also 
uses similar metadata upon building a local index using Apache Solr?. The main 
limitation of the systems in this category is their reliance on term occurrence; 
the lack of more powerful semantics restricts the retrieved results to cases where 
the query terms appear as-is within the Stack Overflow posts. 

Finally, there are certain research efforts towards semantic-aware snippet re- 
trieval. SWIM [23], for instance, which is proposed by the research team behind 
Bing Code Search [28], uses a natural language to API mapper that computes 
the probability Pr(t|Q) that an API t appears as a result to a query Q. The sys- 
tem retrieves the most probable snippets and synthesizes them to produce valid 
and human-readable snippets. A limitation of SWIM, which was highlighted by 
Gu et al. [10], is that it follows the bag-of-words assumption, therefore it cannot 
distinguish among certain queries (e.g. “convert number to string” and “convert 
string to number). The authors instead propose DeepAPI [10], a system that de- 
fines snippet recommendation as a machine translation problem, where natural 
language is the source language and source code is the target language. DeepAPI 
employs a model with three recurrent neural networks (one for the text of the 
query as-is, one for the same text reversed, and one to combine them) that re- 
trieves the most relevant API call sequence given a query. The system, however, 
is largely based on code comments, so its performance depends on whether there 
is sufficient documentation in the snippets of its index. A similar approach is 
followed by T2API [20], another Ecliple plugin that uses a graph-based trans- 
lation approach to translate query text into API usages. This system, however, 
is also largely based on synthesizing API calls and does not focus on semantic 
retrieval. Finally, an even more recent system is CROKAGE [24], which employs 
embeddings and further expands the query with relevant API classes from Stack 
Overflow. The final results are ranked according to multiple factors, including 
their lexical and semantic similarity with the query and their similar API usage. 

In conclusion, the systems analyzed in the above paragraphs have the limita- 
tions that were discussed also in the introduction of this work. Several of them 
are focused only on APIs without generalizing to common programming prob- 
lems. And while there are certain systems that allow queries in natural language, 
most of them rely on term frequency indexing and do not incorporate semantics 
extracted by the context of the snippets. In this work, we design a hybrid system 
that employs both a lexical (term frequency) and a word embeddings model on 
Stack Overflow posts’ data. Note that, compared to source code comments that 


3 https: //lucene.apache.org/solr/ 
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may be incomplete or sometimes even non-existent, the text of Stack Overflow 
posts is a more complete source of information as it is the outcome of the expla- 
nation efforts of different members of the community [7]. As a result, our system 
can extract the semantic meaning of natural language queries and retrieve useful 
snippets, which are accompanied by semantic-aware labels. 


3 StackSearch: A Semantic-aware Snippet Recommender 


3.1 Overview 


The architecture of StackSearch is shown in Figure 1. The left part of the figure 
refers to building the index while the right one refers to answering user queries. 


N 
X stackoverflow 


= Metadata S 
— (“a > 


Fig. 1. Architecture of StackSearch 


At first, our system retrieves information from Stack Overflow* and builds 
an SQLite® database of all Java posts. Note that our methodology is mostly 
language agnostic, however we use Java here as a proof of concept®. We created 
four tables in order to store question posts, answer posts, comments, and post 
links (to be used for evaluation, see subsection 4.2). For each of these tables we 
kept all information, i.e. title, tags, body, score, etc., as well as all connections 
of the data dump as foreign keys (e.g. any answer has a foreign key towards the 
corresponding question), so that we fully take into account the post context. 

Upon storing the data in a suitable format, the Preprocessor receives as 
input all question posts, answer posts, and comments and extracts a corpus of 
texts. The corpus is then given to the Word Model Builder, which trains different 
models to transform the text to vector form. Finally, the system includes a vector 
index, where each set of vectors corresponds to to the title, tags, and body of 
one question post, the produced word models, and certain metadata for each 
question, which are extracted by the Metadata Extractor. 


4 We used the latest official data dump provided by Stack Overflow, which is available 
at https://archive.org/details/stackexchange 

5 https: //www.sqlite.org/ 

6 Applying our methodology to a different language requires only providing a prepro- 
cessor in order to extract the relevant source code elements from the post snippets. 
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When the developer issues a query, the Querying Engine initially extracts 
a vector for the query given the stored vector models, and then computes the 
similarity between the query vector and each vector in the vector index. The 
engine then ranks the results and presents them to the user along with their 
metadata. The steps required to build the index as well as the issuing of queries 
are discussed in detail in the following subsections. 


3.2 Preprocessor 


Upon creating our database, the next step is to preprocess the data in order 
to build the corpus that will be used to train our models. We extract the text 
and the code of each post by parsing the <pre> and <code> tags. We further 
remove all html tags from text and then perform a series of preprocessing steps. 
At first, the code is parsed to extract its semantic information. The posts are 
then filtered to remove the ones that introduce noise to the dataset and, finally, 
the texts are tokenized. These steps are outlined in the following paragraphs. 


Extracting Semantics from Source Code Upon extracting the code from 
each question post, we parse it using an extension of the parser described in [8]. 
The parser checks if the snippets are compilable and also drops any snippets that 
are not written in Java. Upon making these checks, our parser extracts the AST 
of each snippet and takes two passes over it, one to extract type declarations, 
and one to extract method invocations (i.e. API calls). For example, in the 
snippet of Figure 2, the parser initially extracts the declarations is: InputStream, 
br: BufferedReader, and sb: StringBuilder (strings and exceptions are excluded). 
After that, it extracts the relevant API calls, which are highlighted in Figure 2. 


// initialize an InputStream 
InputStream is = new ByteArraylnputStream ("sample” .getBytes()); 
// convert InputStream to String 
BufferedReader br = null; 
StringBuilder sb = new StringBuilder (); 
String line; 
try { 
br = new BufferedReader (new InputStreamReader (is)); 
while ((line = br. readLine()) != null) { sb. append (line); } 
} catch (IOException e) { 
e.printStackTrace(); 
} finally { 
if (br != null) { 
try { br. close (); } catch (IOException e) { e.printStackTrace(); } 
} 


} 


Fig. 2. Example snippet for “How to read a file line by line” (API calls highlighted) 
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Finally, the calling object of each API call is replaced by its type and the 
text of comments is also retrieved to produce the sequence shown in Figure 3. 


initialize an InputStream, InputStream, ByteArraylnputStream, convert InputStream to 
String, BufferedReader, StringBuilder, StringBuilder, BufferedReader, InputStreamReader, 
BufferedReader.readLine, StringBuilder.append, BufferedReader.close 


Fig. 3. Extracted sequence for the snippet of Figure 2 


Filtering the Posts Filtering is performed using a classifier that rules out any 
posts that are considered by our system as noise. We used the regional CNN- 
LSTM model of Wang et al. [26], a model shown in Figure 4 that combines the 
CNN and LSTM architectures and achieves in capturing the characteristics of 
text considering also its order. Our classifier is binary; it receives as input the 
data of each post and its output determines whether a post is useful or noisy. 
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Fig. 4. Architecture of Regional CNN-LSTM model by Wang et al. [26] 


The input embedding layer receives a one-hot encoding that corresponds to 
the concatenation of the title, body and tags of each post. Tokenization and one- 
hot encoding are performed before the text is given as input so no rules are given 
other than splitting on spaces and punctuation (this tokenization process is only 
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used here on-the-fly to filter the posts, while we fully tokenize the text afterwards 
as described in the next paragraph). Punctuation marks are also kept as each of 
them is actually a token. After that, the classifier includes a CNN layer, which 
extracts and amplifies the terms (including punctuation) that cause noise. The 
CNN layer is followed by a max pooling layer that is used to reduce the number 
of parameters that have to be optimized by the model. Finally, the next layer 
is the LSTM that captures semantic information from nearby terms, which is 
finally given to the output to provide the binary decision. 

To train our classifier, we have annotated a set of 2500 posts. For each post, 
we consider it noisy if it has error logs, debug logs or stack traces. Though 
useful in other contexts, in our case these posts would skew our models, as they 
contain a lot of generic data. Furthermore we deem noisy any posts with large 
amounts of numeric data (usually in tables) and any posts with code snippets 
in languages other than Java. The training was performed with accuracy as 
the metric to optimize, while we also used dropout to avoid overfitting. Upon 
experimenting with different parameter values, we ended up using the Adagrad 
optimizer, while the dropout and recurrent dropout parameters were set to 0.6 
and 0.05 respectively. Setting the embedding length to 35 and the number of 
epochs to 5 proved adequate, as our classifier achieved accuracy equal to 0.94. 


Text Tokenization Upon filtering, we now have a set of texts that must be 
tokenized before they are given as input to the models. Since tokenization might 
split Java terms (e.g. method invocations), we excluded these from tokenizing us- 
ing regular expressions. After that, we removed all URLs and all non-alphabetical 
characters (i.e. numbers and special symbols) and tokenized the text. 


3.3 Word Model Builder 


We build two models for capturing the semantics of posts, a Tf-Idf model and 
a Fast'Text embedding. These models are indicative of lexical matching and se- 
mantic matching, respectively. They will serve as baselines and at the same time 
be used to build a more powerful hybrid model (see subsection 3.5). Both models 
are executed three times, one for the titles of the question posts, one for their 
bodies, and one for their tags. As already mentioned the code snippets are re- 
placed by their corresponding text sequences, so they now are textual parts of 
the bodies. The two models are analyzed in the following paragraphs. 


Tf-Idf Model We employ a vector space model to represent the texts (titles 
or bodies or tags) as documents and the words/terms as dimensions. The vector 
representation for each document is extracted using Tf-Idf vectorizer. According 
to Tf-Idf, the weight (vector value) of each term t in a document d is defined as: 


tfidf(t,d, D) = tf (t, d) - idf(t, D) (1) 


where tf(t,d) is the term frequency of term t in document d and refers to the 
number of occurrences of the term ¢ in the document (title, body or tag). Also, 
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idf (t, D) is the inverse document frequency of term t in the set of all documents 
D, and is used as a normalizing factor to indicate how common the term is in 
the corpus. In our implementation (we used scikit-learn), idf (t, D) is equal to 
1+log((1+|D])/(1+d:)), where |d;| is the number of documents containing the 
term t, i.e. the number of titles, bodies or tags that include the relevant term. 
Intuitively, very common terms (e.g. “Java” or “Exception” ) may act as noise 
for our dataset, as they could appear to semantically different posts. 


Fast'Text Model Fast'Text is a neural language model proposed by Facebook’s 
AI Research (FAIR) lab [2,12]. Practically, fastText is a shallow neural network 
that is trained in order to reconstruct linguistic contexts of words. In our case, 
we transform the terms of the documents in one-hot encoding format and give 
the documents as input to the network during the training step. The result, i.e. 
the output of the hidden layer, is actually a set of word vectors. So, in this case, 
the resulting model is one where terms are represented as vectors. Given proper 
parameters, these vectors should incorporate semantic information, so that our 
model will have learned from the context. 

We used the official implementation of fastText”, selected the skip-gram vari- 
ation of the model and we also set it up to use n-grams of size 3, 4, 5, 6, and 7. 
Upon experimenting with the parameters of the model, we ended up building a 
model with 300 dimensions and training it for 25 epochs. We used the negative 
sampling cost function (with number of negative samples equal to 10) and set 
the learning rate to 0.025 and the window size (i.e. number of terms that are 
within the context of a word) to 10. Also, the sampling threshold was set to 
1076, while we also dropped any words with fewer than 5 occurrences. Upon 
extracting all word vectors, we create the vector of each document level (title, 
body or tags) by averaging over its word vectors. 

Finally, the output of either of our two models is a set of vectors, one for the 
title, one for the body and one for the tags of each post. In the case of Tf-Idf 
the dimensions of the vector are equal to the total number of words, while in the 
case of fastText there are 300 dimensions. In both cases, the vectors are stored 
in a vector index, which also contains ids that point to the original posts. 


3.4 Metadata Extractor 


As metadata, we extract the named entities of each post, i.e. useful terms that 
may help the developer understand the semantics behind each post. To do so, we 
build a Conditional Random Fields (CRF) classifier [15], which performs named 
entity recognition based on features extracted from the terms themselves and 
from their context (neighboring terms). The goal is to estimate the probability 
that a term belongs to one of the available categories. To create a feature set for 
each term, we initially use two models. 

At first, we employ the Brown hierarchical clustering algorithm [4] to generate 
a binary representation of all terms in the corpus. The algorithm clusters all 


T https: //github.com/facebookresearch/fastText 
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terms in a binary tree structure. An example fragment of such a tree is shown 
in Figure 5. The leaf nodes of the tree are all the terms, so by traversing the 
tree from the root to a leaf we are given a binary representation known as 
bitstring for the corresponding term. Semantically similar terms are expected 
to share more similar tree paths. For instance, in the fragment of Figure 5, the 
terms ‘array’ and ‘table’ have binary representations 00100000 and 00100001 
respectively, which are quite similar, as is their semantic meaning. The terms 
‘collection’ and ‘list’ are also similar, yet somewhat less, as their representations 
(001000010 and 001001 respectively) differ more. 


00100000] : array 
0010000 

00100001] : table 

001000 
r— 00100010] : sequence 
0010001; 4 
00100 : 
00100011) : list 


001001 : collection 


Fig. 5. Example Fragment of Binary Tree generated by the Brown Algorithm 


Secondly, we use the fast'Text model of subsection 3.3. As already mentioned, 
our model extracts vector representations of terms so that semantically similar 
terms have vectors that are closer to each other (where proximity is computed 
using cosine similarity, see section 3.5). To reduce the size of these vectors (and 
thus avoid the curse of dimensionality), we further employ K-Means to cluster 
them into 5 configurations with different number of clusters (500, 1000, 1500, 
2000, and 3000), an idea originating from similar natural language approaches 
[33, 34]. Thus, instead of using the term vector, we use 5 features for each term, 
each one corresponding to the id of the cluster that the feature is assigned. 

Upon applying the two models, we finally build the feature set for the CRF 
classifier. Given each term t;, its preceding term t;—1 its following term ti+1, we 
define their Brown bitstrings as b;, b;_1, and b;,1 respectively, and we also define 
their K-Means cluster assignments as k;, ki—-1, and ki+ı respectively. Note that 
the k; includes all 5 cluster configurations used, thus producing on its own five 
features. Using these definitions, we build the following feature set: 


— the term itself (¢;), and its combination with the preceding term (t;_,t;), 
and the following term (t;t;41); 

— the ids of the cluster assigned by K-Means to the term (k;), the preceding 
term (k;_1), and the following term (ki+1); 

— the bigram of the ids of K-Means clusters for the three terms (ki—1kiki+1); 

— the bitstrings of the term (b;), the preceding term (b;_1), and the following 
term (bi+1); 
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— the bigram of the three bitstrings (bi—1b;ibi+1); 
— the prefixes with length 2, 4, 6, 8, 10, 12 of each one of the three bitstrings 
(e.g. for a bitstring 100100 the prefixes are 10, 1001, and 100100). 


Finally, our features are augmented by employing the dataset proposed by 
Ye et al. [31,32]. The dataset comprises annotated entities extracted from Stack 
Overflow that lie in five categories: API calls, programming languages, platforms 
(e.g. Android), tools-frameworks (e.g. Maven), and standards (e.g. TCP). For 
each of these categories, we check whether the term is found in the corresponding 
dataset file and produce a true/false decision that is added as one more feature 
in our feature set. After that, we apply the CRF classifier for all terms and build 
a metadata index. Using this index we can produce a list of semantically rich 
named entities for each post in the dataset. 


3.5 Querying Engine 


As already mentioned in subsection 3.3, the vector index comprises a set of 
vectors, three for each question post, corresponding to the title, the body and the 
tags of the post. When a developer issues a new query, it is initially preprocessed 
and tokenized, and then it is vectorized using either of our models. After that, 
we now have to produce a similarity score between each question post p and the 
query q of the developer. To do so, we use the following equation: 


csim(Uq, Vtitle(p)) + CS7M(Uq, Ubody(p)) + CSiM(Ug, Vtags(p)) 
SiMmodel (q, P) = 3 (2) 


where vq is the vector of the query and Vtitle(p), Vbody(p); aNd Vrags(p) are the 
vectors of the title, the body, and the tags of the question post respectively. 
Finally, csim is the cosine similarity, which is computed for two vectors vı and 
vq as follows: 


: U1 + U2 
csim(U1, V2) = 


= Teale 8) 


Apart from the two models described so far, we also created a hybrid model 
by taking the average between the two scores computed by our models: 


sim ¢—rag (q, P) + Sim fastTeat (4, P) (4) 
2 
This hybrid model incorporates the advantages of fastText, while giving more 
weight than only fastText to well-formed queries (i.e. with expected terms). 
Finally, the user is presented with a list of possible results to the query, 
ranked according to their score. Each result contains information extracted by 
a question post and the corresponding answer posts. In specific, we include the 
title of the question post, the snippets extracted by the answer posts, the links to 
the question and answer posts (should the developer want to examine them), the 
Stack Overflow score of the answer posts, and the 8 most frequent named entities 
among all answer posts of the relevant question post. For example, assuming our 
system receives the query “How to read from text file?”, an example result is 
shown in Table 1. The developer can obviously select to check the second most 
relevant snippet of this question post, or even check another question post. 


SiMhybrid (q, p) = 
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Table 1. Example StackSearch Response to Query “How to read from text file?” 


Type Data 

Post title Reading a plain text file in Java 

Question post link https://stackoverflow.com/questions/4716503 

Top 8 labels FileReader, BufferedReader, FilelnputStream, InputStreamReader, 


Scanner.hasNext, Files.readAllBytes, FileUtils.readLines, Scanner 


Scanner in = new Scanner(new FileReader( “file.txt” )); 
StringBuilder sb = new StringBuilder(); 
while(in.hasNext()) { 

Snippet 1 sb.append(in.next()); 


in.close(); 

outString = sb.toString(); 
Answer post link — https://stackoverflow.com/questions/4716556 
Answer post score 117 


4 Evaluation 


To fully evaluate StackSearch, we perform three experiments. The first exper- 
iment involved annotating the results of common programming problems and 
is expected to illustrate the usefulness of our system. The second experiment 
relies on post links and is used to provide proof that our system is effective 
(and minimize possible threats to validity). Finally, for our third experiment, 
we compare StackSearch to the tool CROKAGE [24], which is quite similar to 
our system. Comparing StackSearch with other approaches was not possible, 
since several systems are not maintained and/or they are not publicly avail- 
able (to facilitate researchers with similar challenges, we uploaded our code at 
https: //github.com/AuthEceSoftEng/StackSearch). 


4.1 Evaluation using Programming Queries 


We initially evaluate StackSearch using a set of common programming queries 
shown in Table 2. The dataset includes certain queries that are semantically very 
similar, which are marked as belonging to the same group, to determine whether 
our method captures the semantic features of the dataset. Queries in the same 
group call for the same solutions, i.e. their only difference is in the phrasing. 

We evaluate all three implementations of our system, the Tf-Idf model, the 
fastText model, and the hybrid model. For each implementation, upon giving the 
queries as input, we retrieve the first 20 results and annotate them as relevant or 
non-relevant. A result is marked as relevant if its snippet covers the functionality 
that is described by the query. We gathered the results of all three algorithms 
together and randomly permuted them, so the annotation was performed without 
any prior knowledge about which result corresponds to each model, in order to 
be as objective as possible. 
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Table 2. Dataset used for Semantically Evaluating StackSearch 


ID Query Group 


j 


1 How to read a comma separated file? 

2 How to read a CSV file? 

3 How to read a delimited file? 

4 How to read input from console? 

5 How to read input from terminal? 

6 How to read input from command prompt? 
7 How to play an mp3 file? 

8 How to play an audio file? 

9 How to compare dates? 

10 How to compare time strings? 

11 How to dynamically load a class? 

12 How to load a jar/class at runtime? 

13 How to calculate checksums for files? 

14 How to calculate MD5 checksum for files? 
15 How to iterate through a hashmap? 

16 How to loop over a hashmap? 

17 How to split a string? 

18 How to handle an exception? 


OONNWDDAOOTBHRWWNYNNYNE 


For each query, we evaluate each implementation by computing the average 
precision of the results. Given a ranked list of results, the average precision is 
computed by the following equation: 


AveP = aoa (P(k) - rel(k)) 


number of relevant results 


(5) 


where P(k) is the precision at k and corresponds to the percentage of relevant 
results in the first k, and rel(k) denotes if the result in the position k is relevant. 
We also use the mean average precision, defined us the mean of the average 
precision values of all queries. 

We calculated the average precision at 10 and 20 results. The values for each 
query are shown in Figure 6. As shown in these graphs, the fastText and the 
hybrid models clearly outperform the TfIdf model, which is expected as they 
incorporate semantic information. We also note that the hybrid implementation 
is even more effective than fastText for most queries. Interestingly, there are 
certain queries in which Tf-Idf outperforms one or both of the other implemen- 
tations. Consider, for example, query 17; this is a very specific query with clear 
terms (i.e. developers would rarely form such a query without using the term 
‘string’) so there is not really any use for semantics. For most queries, however, 
better results are proposed by fastText or by our hybrid model. 

We note, especially, what is the case with queries in the same group (divided 
by gray lines in the graphs of Figure 6). Given, for instance, the second group, 
query 4, which refers to input from the console, returns multiple useful results 
using any of the three models. The results, however are quite different for queries 
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Fig. 6. Average Precision for the three Implementations (a) at 10, and (b) at 20 Results 


Average Precision 


5 and 6, which are similar albeit for the replacement of the term ‘console’ with 
‘terminal’ and with ‘command prompt’ respectively. This indicates that our word 
embedding successfully captures the semantics of the text and considers the 
aforementioned terms as synonyms. This advantage of our system is also clear 
in group 1 (comma-separated vs CSV vs delimited file), group 4 (dates vs time 
strings), etc., and even in more difficult semantic relationships, such as the one 
of group 5 (i.e. loading dynamically vs at runtime). 

Finally, we calculated the mean average precision for the same configurations 
as before. The values for the three implementations are shown in Figure 7a, where 
it is clear once again that the word embeddings outperform the Tf-Idf model, 
while our hybrid model is the most effective of the three models. 

To further outline the differences among the models we also computed the 
mean search length. The search length is a very useful metric since it intuitively 
simulates the process used when searching for relevant results. The metric is 
defined as the number of non-relevant results that one must examine in order to 
find a number of relevant results. We computed the search length for all queries 
for finding from 1 up to 10 relevant results. 
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Fig. 7. Results depicting (a) the Mean Average Precision, and (b) the Mean Search 
Length, for the three Implementations 


Averaging over all queries provides the mean search length, of which the re- 
sults are shown in Figure 7b. The results are again encouraging for our proposed 
models. Indicatively, to find the first useful result, the developer has to exam- 
ine less than 0.1 irrelevant results on average for fastText and for the hybrid 
model, whereas using Tf-Idf requires examining 1.5 irrelevant results. Further- 
more, when the developer skims over the results of fastText, he/she will only 
need to view 2.11 irrelevant snippets on average, before finding the first 5 rele- 
vant. Using the hybrid model, he/she will only need to see 1.22. Tf-Idf is clearly 
outperformed in this case, providing on average almost 7 irrelevant results, along 
with the first 5 relevant. Similar conclusions can be drawn for the first 10 relevant 
results. In this case, the developer would need to examine around 17.5, 7.5, and 
5.5 results on average, for Tf-Idf, fastText, and our hybrid model, respectively. 


4.2 Evaluation using Post Links 


The main goal of the previous subsection was to illustrate the potential of our 
word embedding models. The results, especially for the groups of queries, have 
shown that our models indeed capture the semantics of text. As already men- 
tioned, the annotation process was performed in such a way to limit any threats 
to validity. Nevertheless, to further strengthen the objectivity of the results, we 
perform one more experiment, which is described in this subsection. 

In the lack of a third-party annotated Stack Overflow dataset, what we de- 
cided to do is evaluate our models using the post links provided by Stack Over- 
flow, an idea found in [8]. In Stack Overflow, the presence of a link between two 
questions is an indicator that the two questions are similar. Note, of course, that 
the opposite assumption, i.e. that any two questions that are not linked are not 
similar to each other, is not necessarily correct. There are many questions that 
are asked and perhaps not linked to similar ones. In our evaluation, however, we 
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formulate the problem as a search/retrieval scenario, so we only use post links 
to determine whether our models can retrieve objectively relevant results. 

To create our link evaluation dataset, we first extracted all post links of 
Java question posts. After that, for performance reasons, we dropped any posts 
without snippets and any posts with Stack Overflow score lower or equal to -3, as 
these are not within the scenario of a system that retrieves useful snippets. These 
criteria reduced the number of question posts to roughly 200000 (as opposed to 
the original dataset that had approximately 1.3 million question posts). These 
question posts have approximately 37000 links, reinforcing our assumption that 
non-linked questions are not necessarily dissimilar. 

We execute StackSearch with all three models giving as queries the titles of all 
question posts of the dataset. For each query, we retrieve the first 20 results (as 
we may assume this is the maximum a developer would normally examine). We 
determine how many of these 20 results are linked to the specific question post, 
and compute the percentage of relevant results compared to the total number of 
relevant post links of the question post. By averaging over all queries (i.e. titles 
of question posts of the dataset), we compute the percentage of relevant links 
retrieved on average for each model. The results are shown in Figure 8. 
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Fig. 8. Percentage of Relevant Results (compared to the number of Links of each 
Question Post) in the first 20 Results for the three Implementations 


At first, one may note that the results for all models are below 30%, a rather 
low number, which is however expected, given the shortcomings of our dataset. 
Many retrieved results are actually relevant, however they are not linked to the 
question posts of the queries. In any case, we are given an objective relative 
comparison of the three models. And this comparison provides some interesting 
insights. An interesting observation is that Tf-Idf outperforms fastText. This is 
not totally unexpected, if we consider that the post links of Stack Overflow are 
created by the community, therefore it is possible that posts with similar mean- 
ings but different key terms are not linked. As a result, fastText may discover 


Extracting Semantics from Question-Answering Services for Snippet Reuse 135 


several posts that should be linked, yet they are not. On the other hand, Tf-Idf 
focuses on identical terms which are rather easier to discover using the Stack 
Overflow service. In any case, however, our hybrid model outperforms Tf-Idf 
and fastText, as it combines the advantages of Tf-Idf and fastText. 


4.3 Comparative Evaluation 


Upon demonstrating the effectiveness of StackSearch in the previous subsections, 
we now proceed to compare it with a similar system, the tool CROKAGE. To 
do so, we have employed the dataset proposed by CROKAGE [24]. The dataset 
involves 48 programming queries, similar to those introduced in subsection 4.1. 
The queries include diverse tasks, such as comparing dates, resizing images, 
pausing the current thread, etc. 

Given that our dataset comprises Stack Overflow posts, it can be used to 
assess both tools. Thus, we issued the queries at both StackSearch and CROK- 
AGE. The results of the queries had been originally annotated by two annotators 
(of which the results were merged) in Stack Overflow posts, marking any post 
as relevant if it addresses the query with a feasible amount of changes [24]. So 
we have used these annotations and only had to update a small part of them in 
order to make sure that they are on par with our dataset, which includes the 
latest data dump of Stack Overflow. As before, for each query we have calculated 
the average precision at 5 and 10 results as well as the search length for finding 
1 up to 10 relevant results. The mean average precision and the mean search 
length results for the two tools are shown in Figures 9a and 9b, respectively 
(results per query are omitted due to space limitations). 
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Fig. 9. Results depicting (a) the Mean Average Precision, and (b) the Mean Search 
Length, for StackSearch and CROKAGE 


Both tools seem to be effective on the provided dataset. Concerning mean 
average precision, StackSearch outperforms CROKAGE both at 5 and at 10 
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results, indicating that it retrieves more useful results on average. Moreover, it 
seems that their difference is more noticeable when a fewer number of results is 
required, indicating that StackSearch provides a better ranking. 

This difference is also illustrated by the mean search length for the two 
approaches. Indicatively, using StackSearch, the developer will need to examine 
only 0.66 irrelevant snippets on average, before finding the first relevant one (the 
corresponding value for CROKAGE is 1.42). Our tool also performs better for 
finding the second and third relevant results, while the two tools perform equally 
well for finding five or more results. 


5 Conclusion 


Although several API usage and snippet mining solutions have been proposed, 
most of them do not account for the semantics of the source code and the sur- 
rounding text. Furthermore, most contemporary systems do not employ word 
embeddings to enable semantic-aware retrieval of snippets, and are limited ei- 
ther by the format of their input, which is not natural language, or by their 
output, which is not ready-to-use snippets. In this work, we have created a novel 
snippet mining system that extracts snippets from Stack Overflow and employs 
word embeddings to model code and as well as contextual information. Given 
our evaluation, we conclude that the hybrid model of StackSearch effectively 
extracts the semantics of the data and outperforms both our baselines (Tf-Idf 
and fastText) as well as the snippet mining tool CROKAGE. Finally, our sys- 
tem accompanies the retrieved snippets with useful metadata that convey the 
meaning of each post. 

Future work lies in several directions. At first, we may employ a more sophis- 
ticated ranking scheme using more information from Stack Overflow (e.g. the 
Stack Overflow score of the snippet’s answer post) or even from other sources 
(e.g. the reuse rate of Stack Overflow snippets in GitHub [1]) and assess the in- 
fluence of that information on the effectiveness of the scheme. Furthermore, we 
could employ different word embedding techniques or even variations of fastText, 
such as the combination of the In-Out vectors of fastText [19]. We could also 
further investigate our hybrid solution, implementing a more complex scheme 
other than averaging the scores of the two models. Finally, we could further as- 
sess StackSearch using a survey to ask developers whether the system actually 
retrieves useful snippets and whether it reduces the effort required for finding 
and integrating reusable snippets. 
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Abstract. Non-determinism in a concurrent or distributed setting may 
lead to many different runs or executions of a program. This paper 
presents a method to reproduce a specific run for non-deterministic ac- 
tor or active object systems. The method is based on recording traces 
of events reflecting local transitions at so-called stable states during ex- 
ecution; i.e., states in which local execution depends on interaction with 
the environment. The paper formalizes trace recording and replay for a 
basic active object language, to show that such local traces suffice to 
obtain global reproducibility of runs; during replay different objects may 
operate fairly independently of each other and in parallel, yet a program 
under replay has guaranteed deterministic outcome. We then show that 
the method extends to the other forms of non-determinism as found in 
richer active object languages. Following the proposed method, we have 
implemented a tool to record and replay runs, and to visualize the com- 
munication and scheduling decisions of a recorded run, for Real-Time 
ABS, a formally defined, rich active object language for modeling timed, 
resource-aware behavior in distributed systems. 


1 Introduction 


Non-determinism in a concurrent or distributed setting leads to many different 
possible runs or executions of a given program. The ability to reproduce and 
visualize a particular run can be very useful for the developer of such programs. 
For example, reproducing a specific run representing negative (or unexpected) 
behavior can be beneficial to eliminate bugs which occur only in a few out 
of many possible runs (so-called Heisenbugs). Conversely, reproducing a run 
representing positive (and expected) behavior can be useful for regression testing 
for new versions of a system. 

Deterministic replay is an emerging technique to provide deterministic ex- 
ecutions of programs in the presence of different non-deterministic factors [1]. 
In a first phase, the technique consists of recording sufficient information in a 
trace during a run to reproduce the same run during a replay in a second phase. 
Approaches to reproduce runs of non-deterministic systems can be classified as 
either content-based or ordering-based replay. Content-based replay records the 
results of all non-deterministic operations whereas ordering-based replay records 
the ordering of non-deterministic events. 
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This paper considers deterministic replay for non-deterministic runs of Ac- 
tive Object languages [2], which combine the asynchronous message passing of 
Actors with object-oriented abstractions. Compared to standard OO languages, 
these languages decouple communication and synchronization by communicating 
through asynchronous method calls without transfer of control and by synchro- 
nizing via futures. We develop a method to reproduce the runs of active objects. 
The method is ordering-based, as we represent the parallel execution of active 
objects as traces of events. We show that locally recording events at so-called 
stable states suffice to obtain deterministic replay. In these states, local execu- 
tion needs to interact with the environment, e.g., to make a scheduling decision 
or to send or receive a message. We formalize execution with record and replay 
for a basic active object language, and show that its executions enjoy confluence 
properties which can be described using such traces. These confluence properties 
justify the recording and replay of local traces to reproduce global behavior. 

Active object languages may also contain more advanced features [2], such as 
cooperative scheduling [3,4], concurrent object groups [3,5] and timed, resource- 
aware behavior [6]. With cooperative scheduling, an object may suspend its 
current task while waiting for the result of a method call and instead schedule 
a different task. With concurrent object groups, several objects share an actor’s 
lock abstraction. With timed, resource-aware behavior, local execution requires 
resources from resource-centers (e.g., virtual machines) to progress. These fea- 
tures introduce additional non-determinism in the active object systems, in ad- 
dition to the non-determinism caused by asynchronous calls. We show that the 
proposed method extends to handle these additional sources of non-determinism. 

The proposed method to deterministically replay runs has been realized for 
Real-Time ABS [6], a modeling language with these advanced features, which has 
been used to analyze, e.g., industrial scale cloud-deployed software [7], railway 
networks [8], and complex low-level multicore systems [9,10]. Whereas the lan- 
guage supports various formal analysis techniques, most validation of complex 
models (at least in an early stage of model development) is based on simulation. 
The tracing capabilities have a small enough performance impact to be enabled 
by default in the simulator. The simulator itself is implemented as a distributed 
system in Erlang [11]. The low performance overhead comes from only recording 
local events in each actor, which does not impose any additional communication 
or synchronization, which are typically bottlenecks in a distributed system. 


Contributions. Our main contributions can be summarized as follows: 


— we propose a method to reproduce runs for active object systems based on 
recording events reflecting local transitions from stable object states; 

— we provide a formal justification for the method in terms of confluence and 
progress properties for ordering-based record & replay for a basic actor lan- 
guage with asynchronous communication and synchronization via futures; 

— we show that the method extends to address additional sources of non- 
determinism as found in richer active object languages; and 

— we provide an implementation of the proposed method to record, replay and 
visualize runs for the active object modeling language Real-Time ABS. 
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main 
main 


class C { © schedule 
Int n = 1; schedule Bs 
Unit mi(Q) { n =n - 1; } e 
Unit m2() { n = n * 3; } o. aaan 
} invocation `°- . H : 
g a Da invocation ERLA 
// Main block m2 © e 
{ eo... schedule E S 
invocation `° +. *, schedule 
ino cO; ee e 
oim ; h, wl 
i : a © 
o!m2(); schedule schedule 


} 


Fig.1: A simple program, with Fig. 2: The executions leading to the two 
two possible results different results for the simple program. 


Paper overview Section 2 provides a motivating example, Section 3 considers the 
problem of reproducibility for a formalization of a basic active object language 
and Section 4 formalizes record and replay over the operational semantics of the 
basic language. Section 5 considers reproducibility for extensions to the basic 
language. Section 6 presents our implementation of the method for Real-Time 
ABS. Section 7 discusses related work and Section 8 concludes the paper. 


2 Motivating Example 


Consider the program in Fig. 1. It consists of a class C, with a single integer 
field, initialized to 1 and two methods m1 and m2. The main block of the pro- 
gram creates an active object o as an instance of the class, and performs two 
asynchronous calls on o, o!m1() and o!m2() respectively. Since the calls are 
asynchronous, the caller can proceed to make the second call immediately, with- 
out waiting for the first call to complete. The two calls are placed in the queue 
of o and scheduled in some order for execution by o. (We here assume method 
execution is atomic, but this assumption will be relaxed in Section 5.) 

Thus, even the execution of this very simple program can lead to two differ- 
ent results, depending on whether o!m1() is scheduled before o!m2(), and con- 
versely, o!m2() is scheduled before o!m1(). In the first case, the field n (which 
is initially 1) will first be decremented by 1 and then be multiplied with 3, re- 
sulting in a final state in which the field n has the value 0. In the second case, 
the field n is first multiplied by 3, then decremented by 1, resulting in a final 
state in which the field n has the value 2. Fig. 2 depicts the two cases (using 
the visualization support in our tool, described in Section 6.3). Note that this 
problem still occurs for languages with ordered message passing between two 
actors (e.g., Erlang [11]) when the two calls are made by different callers. 

The selection of run to execute is decided by the runtime system and is 
thus non-deterministic for the given source program. In general, there can be 
much more than two possible runs for a parallel active object system. If only 
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a few of the possible runs exhibit a particular behavior (e.g., a bug), it can be 
very interesting to be able to reproduce a particular run of the given program. 
We propose a method to instrument active objects systems which allows global 
reproducibility of runs through local control for each active object. 


3 A Formal Model of Reproducibility 


To formalize the problem of global reproducibility through local control for ac- 
tive object systems, we consider a basic active object language in which non- 
determinism stems from the order in which method calls are selected from the 
queue of the active objects. 


3.1 A Basic Active Object Language 


Consider a basic active object language with P ::= OL {T z; s} 
asynchronous method calls and synchroniza- CL ::= class C {T x; M} 
tion via futures. The language has a Java- M:=T m (T x){T z; s} 
like syntax, given in Fig. 3. Let T, C and m 5 = s; s | skip | z = rhs 
range over type, class and method names, re- | if e {s}else{s} 
spectively, and let e range over side-effect free [iile e{s} | sekurs < 
expressions. Overlined terms denote possibly rhaii=e [em C le) | eime) | 2:geë 
empty lists over the corresponding syntactic Fig. 3: BNF for the basic active 
categories (e.g., € and 7). object language. 

A program P consists of a list CL of class declarations and a main block 
{T x; s}, with variables x of type T and a statement s. A class C declares fields 
(both with types T) and contains a list M of methods. A method m has a return 
type, a list of typed formal parameters and a method body which contains local 
variable declarations and a statement s. Statements are standard; assignments 
x = rhs allow expressions with side-effects on the right-hand side rhs. 

Asynchronous method calls decouple invocation from synchronization. The 
execution of a call f = o!m(args) corresponds to sending a message m(args) 
asynchronously to the callee object o and initializes a future, referenced by f, 
where the return value will be stored. The statement x = f.get retrieves the 
value stored in the future f. This operation synchronizes with the method return; 
i.e., the execution of this statement blocks the active object until the future f has 
received a value. Messages are not assumed to arrive in the same order as they 
are sent. The selection of messages in an object gives rise to non-determinism in 
the execution. An example of a program in this language was given in Section 2. 


3.2 An Operational Semantics for the Basic Language 


We present the semantics of the basic active object language as a transition 
relation between configurations cn. In the runtime syntax (Fig. 4), a configura- 
tion cn can be empty (€), or a set of objects, futures, and invocation messages. 
We let o and f be dynamically created names from a set of object and future 
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cn ::= e | object | future | invoc | cn cn q ::= € | process | q q 
future ::= fut( f, val) val := v | L 
object ::= ob(0, a, p,q) anu=x2euviaa 
process ::= {a | s} p ::= process | idle 
invoc ::= inv(o, f, m, v) v ::= o | f | true | false | t 


Fig. 4: Runtime syntax; here, o and f are object and future identifiers. 


identifiers, denoted Identifiers. An active object ob(0, a, p,q) has an identifier o, 
attributes a, an active process p (that may be idle) and an unordered process 
pool q. A future fut(f, val) has an identifier f and a value val (which is L if the 
future is not resolved). An invocation inv(o, f,m,U) is a message to object o to 
activate method m with actual parameters U and send the return value to the 
future f. Attributes bind program variables x to values v. A process {a | s} has 
local variables a and a statement list s to execute. Values are object identifiers 
o, future identifiers f, Boolean values true and false, and other literal val- 
ues ¢ (e.g., natural numbers). The initial state of a program consists of a single 
active objects ob(Omain, @,P,@), where the active process p corresponds to the 
main block of the program. Let names(cn) denote the set of object and future 
identifiers occurring in a configuration cn. 

Figure 5 presents the main rules of the transition relation cn > cn’. A run 
is a finite sequence of configurations cno,cn1,...,CN» such that cn; > cnji+1 for 
0 <i <n. We assume configurations to be associative and commutative (so 
we can reorder configurations to match rules), where Ž denotes the reflexive 
and transitive closure of >. Let bind(m,%, f,C) denote method lookup in the 
class table, returning the process corresponding to method m in class C with 
actual parameters 0 and with future f as the return address of the call. Thus, 
every process has a local variable destiny which denotes the return address of the 
process (i.e., the future that the process will resolve upon completion), similar 
to the self-reference this for objects. We omit explanations for the standard rules 
for assignment to fields and local variables, conditionals, while and skip. 

Rule Activate formalizes the scheduling of a process p from the unordered 
queue q when an active object is idle. In ASyNc-CALL, an asynchronous method 
call creates a message to a target object o’ and an unresolved future with a fresh 
name f. Object creation in NEw-AcTor creates a new active object with a fresh 
identifier o’, and initializes its attributes with initAttributes(C,o’), including 
reference to itself (this). These are the only rules that introduce new names for 
identifiers; let a predicate fresh(o) denote that o is a fresh name in the global 
configuration (abstracting from how this is implemented). Rule Loap puts the 
process corresponding to an invocation message in called object’s queue. Rule 
RETURN resolves the future associated with a process with return value v, and 
READ-Fut fetches the value v of a future f into a variable. With rule CONTEXT, 
parallel execution in different active objects has an interleaving semantics. 


Definition 1 (Stable configurations). A configuration cn is stable if, for all 
objects in cn, the execution is blocked or the object needs to make a scheduling 
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(ACTIVATE) (ASSIGN1) (AssIGN2) 
peg v = felan) tedom(l) v= [elas zé dom(!) 
ob(o, a, idle, q) ob(o, a, {l | x = e; s},q) ob(0, a, {l | x = e; s}, q) 


> 0b(0,4,p,q\{p}) —> ob(o, a, {I[z +> v] | s},q) > obo, a[x > v], {l | 5}, 4) 


(Conp1) (Conb2) 
true = [ell aor) false = [ell (act) 
ob(0, a, {l | if e {s1} else {s2};s},q) ob(o,a, {l| if e {sı} else {s2}; s},q) 
— ob(0,a, {l | 81; 8},q) = ob(0, a, {l | s2; 8},q) 
(WHILE) (SKIP1) 


ob(0, a, {l | skip; s}, q) 


s| = sı; while e {s1} > oblo, a, ae) 


ob(0,a, {l | while e {s1}; s2}, q) 


— ob(0,a, {l| if e {s]} else {skip}; s2}, q) (SKIP2) 
ob(0, a, {l | skip}, q) 
(NEW-ACTOR) — ob(0, a, idle, q) 
a’ = initAttributes(C, 0’) _ fresh(o’) (Context) 
ob(0,a, {l | x = new C(); s}, q) cni > cn}, 


— ob(0,a, {l | £x = 0’; s},q) ob(o', a’, idle, Ø) cn eng — en’, eng 


(AsyNc-CALL) (Loab) 

of = [elan v= lelana) _ fresh(f) p' = bind(m, v, f, classOf(o)) 

ob(0,a, {l | x = elm(@); s}, q) inv(o, f, m, U) ob(0,a, p, q) 
— ob(0, a, {l| x = f; s},q) inv(o’, fim, D) fut(f, L) — ob(0,a, p,q Y {p'}) 
(RETURN) (READ-FuT) 
v = [elan U(destiny) = f vl f= [elan 
ob(0,a, {l | return e},q) fut(f,L) ob(0,a, {L| « = e.get; s}, q) fut(f, v) 
— ob(0,a, idle, q) fut( f, v) > ob(0,a, {l | £ = v; 5},q) fut(f, v) 


Fig. 5: Semantics of the basic active object language. 


decision. An object is blocked if it needs to execute a get-statement. An object 
needs to make a scheduling decision if its active process is idle. 


Let G denote a stable configuration. We say that two stable configurations 
Gı and Gə are consecutive in a run Gj = Gə if, for all cn such that Gy =, en 
and cn —> Go, if cn + G1 and cn ¥ Gs then cn is not a stable configuration. 


Lemma 1 (Reordering of atomic sections). Let Gi and G2 be stable con- 
figurations. If Gy = G2, then there exists a run between Gi and Go in which 
only a single object executes between any two consecutive stable configurations. 


Proof (sketch). Observe that the notion of stability captures any state of an 
object in which it needs input from its environment. The proof then follows 
from the fact that the state spaces of different objects are disjoint and that 
message passing is unordered. This allows consecutive independent execution 
steps from different objects to be reordered. 
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(LOCAL-ASSIGN1) (LOCAL-ASSIGN2) 
v = [e] won 2% € dom(!) v = [e] aon 2 ¢ dom(!) 
a, {l| x= e;s}~ a, {læ > v]|s} a, {l| x= e; s} ~ afer v], {l | s} 


(LocaL-WHILE) 
s| = s1; while e {s1} 
a, {l | while e {s1}; s2} ~œ a, {l | if e {s1} else {skip}; s2} 


(LOCAL-SKIP1) 
a, {l | skip; s} ~ a, {I | s} 


(LocaL-CoND1) (LocaL-CoND2) 
true = [e]] (aor) false = [fe] (acu (LOcAL-SKIP2) 

a, {l | if e {s1} else {s9};s} a, {l| if e {sı} else {so};s} a, {1 | skip} ~~ a, idle 
~ a, {l| 51; 8} ~ a, {l | so; s} 


(GLOBAL-ACTIVATE) 


T l (GLOBAL-RETURN) (GLOBAL-CONTEXT) 
i E =f q P Nip K v = felaon destiny) = f cny E en’ 
obo, a, {l | return e}, q) fut(f, L) cnı cnz 


ob(o, a, idle, q) but Wi 
sched o,f boro oe, ob(0, a, idle, q) fut(f,v) L, en’, eng 
——$=—= ob(o, a >P sg’) 
(GLOBAL-NEW-ACTOR) (GLOBAL-READ-FuT) 
a” = initAttributes(C, o’) vl f= [e] 
fresh(o’) a, {l |x = 0; s}~ a',p' a {l| £= v; s} ~o a',p 
ob(0,a, {l | x = new C(); s}, q) ob(0,a, {l | x = e.get; s}, q) fut(f, v) 


new<o,0'» 
> 


ob(0, a’, p,q) b,a”, idle, D) MRED, oblo, a! p,q) fut(f,v) 


(GLOBAL-ASYNC-CALL) 
d= Tel (act v= Tel (aot) (GLOBAL-LoabD) 
bes a 
fehl adila= fis} as aip p' = bind(m, v, f, classOf(o)) 


ob(0,a, {l | £ = elm(2); s}, q) inv(o, f,m, U) ob(0,a, p,q) 


— ob(o, a, p,q Y {p’}) 
ob(0, a’, p,q) inv(o', f,m,®) fut( f, L) 


inulo, f) 
—— 


Fig. 6: Coarse-grained, labelled semantics of the basic active object language. 


3.3 A Labelled Operational Semantics for the Basic Language 


Based on Lemma 1, we can define a semantics of the basic active object language 
with a more coarse-grained model of interleaving which is equivalent to the 
semantics presented in Fig. 5. We let this coarse-grained semantics be labeled 
by events to record the interaction between an active object and its environment. 
The events are defined as follows: 


Definition 2 (Events). Let o,f € Identifiers. The set E of events ev is given by 
ev ::= new<o, oy | invio, fy | sched<o, fy | futWr<o, fò | futRe<o, fY. 


In the coarse-grained semantics, a transition relation a,p ~ a’,p’ captures 
local execution in an active object with attributes a. These rules are given in 
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Fig. 6 (top) and correspond to the rules Assign1, ASSIGN2, WHILE, COND1, COND2, 
Sxipl and Sxrp2 of Fig. 5. These rules are deterministic as there is at most one 
possible reduction for any given pair a, p. Let ~, denote the reflexive, transitive 
closure of ~>, let the unary relation ~ denote that there is no transition from 


! 
a given pair a,p, and let the relation ~> denote the reduction to normal form 
according to ~œ; i.e., 


SE eo oy roy 
apra, p <> a4p~ra,p Aa,pr 


In the remaining rules, given in Fig. 6 (bottom), a labelled transition relation 
cn £> en! captures transitions in which the local execution of an active object 
interacts with its environment through scheduling, object creation, method in- 
vocation, or interaction with futures. These rules also correspond to the similar 
rules in Fig. 5, with two differences: 


1. The rules are labelled with an event reflecting the particular action taken in 
the transition, and 

2. the rules perform a local deterministic reduction to normal form according 
to the ~ relation in each step. 


Remark that rule GLOBAL-LOAD is identical to Loan of Fig. 5; although we do not 
need to add an explicit label the rule is kept at the global level since it involves 
both an object and a message. Rule GLOBAL-CONTEXT is labeled by ev? to cap- 
ture that the label is optional (i.e., the rule also combines with GLOBAL-LOAD). 
We henceforth consider runs for the basic active object language based on this 
labelled semantics. 


3.4 Execution Traces and their Reordering 


This section looks at traces reflecting the runs of programs in the basic active 
object language according to the semantics of Section 3.3, and their reordering. 
We consider traces over events in E. Let e denote the empty trace, and 7, -T2 the 
concatenation of traces 7, and T2. For an event ev and a trace T, we denote by 
ev E€ T that ev occurs somewhere in 7 and by 7 ew ev that r ends with ev (i.e., 
dr’.r = 7'-ev). Define r/o and r/f as the projection of a trace 7 to the alphabet of 
an object o and a future f, by their first or second argument respectively (where 
an alphabet is the set of events involving that name). Finally, let names(r) 
denote the inductively defined function returning the set of identifiers that occur 
in a trace T (e.g., names(inv<o, fY) = {o, f}). We assume that every initial 
configuration has a main object and process, and let names(€) = {Omain, fnain}- 
Given a run cno 1> --- “> CNn+1, we denote cno => CNn+1 that a trace T 
is the trace of the run if T = evg--- eun (where 7 ignores the unlabeled transition 
steps of the run). Well-formed traces can be defined as follows, based on [12]: 
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Definition 3 (Well-formed Traces). Given o,o', f € Identifiers. Let wfr) 
denote that T is well-formed, defined inductively: 


ufle) = > True 

(T - new<o,o')) <= > wfT) A oE names(r) A o ¢ names(T) 
(7 - invlo, fX) <=> wT) A o€ names(r) ^ f ¢ names(T) 

uf(t - sched{o, fy) => wf(r) A oe names(T) A T/f = invlo', fY 
(7+ futWro, f>) — = wkr) a 7/f ew sched{o, fy 

(7 - futRe<o, fX) — = wT) a futWrlo', f) ET 


Wellformedness thus captures a happens-before relation over events while 
ensuring that certain identifiers are new at given points in the trace. Din and 
Owe have shown that the trace of any run of the semantics of an active object 
language similar to ours is well-formed [12]. For example, no process can be 
scheduled unless it has been invoked (which again requires the GLOBAL-LOAD 
rule to apply in between GLOBAL-ASYNC-CALL and GLOBAL-ACTIVATE). Given 
a trace 7, we can now define the equivalence class [7] of traces which preserve 
the local ordering and the wellformedness of 7, as follows: 


Definition 4 (Global trace set). Let 7 be a trace and define 
[T] = {r | r'/o = T/o for all object identifiers o e names(T) A wf(7')}. 


Remark that this construction is closely related to equivalence classes in 
Mazurkiewics trace theory [13], with wellformedness as the dependency relation 
of the equivalence classes. 


Example 1. The program from Fig. 1 (Section 2) has the following traces: 
Ti = new Onain, o) inv COmain; fad) i INU lOmain, fa i sched <o, fa) a sched <o, fa? 


T2 = NeW Omain, O) ` NUCOmain; fmi) + Sched Co, fmi) + iNUCOmain, fm2> + sched Lo, faz? 
new Onains o% 7 iNV{Onain; fad : inv COnain; fa? . sched<o, fa? 7 sched <o, fa) 


Observe that traces 7, and T2 belong to the same global trace set (i.e. [7] = [72]), 
and will produce the same final state. 


T3 


Let G a G’ denote a run between consecutive stable configurations which 
executes the process identified by f on object o in the stable configuration G 
until the next stable configuration G”. If sched<o, f)-7 is the trace of G 2E G, 


then 7 is a trace over the event set {inv<o, f, newlo, o), futWr<o, fy | 0’, f'e 
Identifiers}. This observation provides an intuition for the following lemma: 


Lemma 2 (Local confluence). Let G1, G2, G3 be stable configurations, o, o' 
object and f, f’ future identifiers, witho 4 o', f # f’. If Gi oy Gə and Gy 2L 


G3, then there is a stable configuration G4 such that G2 au, G4 and G3 = G4. 


Proof (sketch). The proof follows from the fact that execution in an object does 
not inhibit a process to run in another object. 
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The following theorem shows that local confluence implies global confluence 
for executions in the same global trace set (which means that the two executions 
agree on the local trace projections). 


Theorem 1 (Global confluence). Let G,,G2,G3 be stable configurations and 
Ti, T2 traces such that Gy I Go and G, 3 G3. If T2 € [T1] then G2 = G3. 


Proof (sketch). Observe that runs with traces in the same global trace set must 
agree on the naming of objects and futures. The result then follows by induction 


over the length of Gi = Gv from local confluence (Lemma 2). 


4 Global Reproducibility with Local Traces 


The global confluence of executions with traces in the same global trace set 
provides a formal justification for a method to obtain global reproducibility for 
distributed active object systems which exhibit non-deterministic behavior. The 
method is based on enforcing the local trace projection from the global trace 
set on each active object. For the basic active object language, the method is 
based on recording the events from the set € during an execution. This set of 
events, which includes events capturing the scheduling decisions of the runtime 
system as well as the choice of dynamically created names during a particular 
execution, is sufficient to establish the wellformedness of the recorded trace and 
identify the global trace set of the recorded run. Furthermore, if we record local 
traces for each active object, these will correspond to the local trace projections 
of the global trace set. In fact, any composition of local traces recorded during 
a run will result in the same global trace set. Similarly, any composition of local 
trace projections enforced during a replay will result in a trace in the same 
global trace set. Thus, Theorem 1 guarantees that local recording and replay of 
different traces from the same global trace set will result in the same final state. 
It remains to show that for any such trace in the global trace set corresponding to 
a recorded run, the execution during replay will not get stuck. For this purpose, 
we now formalize record and replay as extensions to the semantics of the basic 
active object language. 

We extend the operational semantics of Fig. 6 to record and replay traces. 
Let 7 > cn denote an extended runtime configuration, where 7 is a witness for 
cn, playing dual roles for recording and replaying. A recorded run starts from an 
initial configuration e€ œ cn, where cn is the initial configuration of the run to be 
recorded. The reduction system for recording a trace is given as a relation “> by 
the rules in Fig. 7; the two rules correspond to the unlabeled (just GLoBAL-Loap) 
and labeled transitions of the semantics, respectively. A replay starts from an 
initial configuration T > cn, where 7 is a trace and cn the initial configuration 
of the run to be replayed. The reduction system for replaying a trace is given as 
a relation —> by the rules in Fig. 8, the two rules are symmetric to those for 
recording a run. The rules in Fig. 7 and Fig. 8 formalize the obvious relation 
between the recording and replaying of a trace and a run in the semantics of the 
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(UNLABELED-RECORD) (UNLABELED-REPLAY) 
cn > cn! cn —> cn’ 

T>en — Te cn’ Toon —> Teen’ 

(LABELED-RECORD) (LABELED-REPLAY) 
ev 1 ev , 
cn => cn cn => cn 
T>cn — T: evo cn ev: T> cn —> Te cn! 
Fig. 7: Semantics of Record Fig. 8: Semantics of Replay 


basic active object language. Let => and => denote the reflexive, transitive 
closures of > and —>, respectively. 


A e 
Lemma 3 (Freshness of names). For any recording € > cn => T > cn! or 


> 
replay T+ T' > cn => T' œ cn', we have that names(T) = names(cn’). 


Proof (sketch). Follows by induction over the length of € > cn = r œ> en’ and 


> . 
T: T> cn = T œ cn’, respectively. 


It follows from Lemma 3 that given an identifier x € Identifiers and a run 
e> cn => Te cn’, if x ¢ names(T), then x ¢ names(cn’) and consequently, 
the predicate fresh(x) will hold as a premise for any rule in the semantics that 
one may want to apply to cn’. Consequently, fresh-predicates in the premises of 
the transition rules of the basic active language will accept the identifier names 
chosen from the recorded trace when replaying a run. 


Lemma 4 (Progress for replay by global trace). Let G,G’ be stable con- 
figurations. If e> G = r>G" then T> G = c> G. 


Proof. The proof is by induction over the length of the run e> G = roG’. The 
base case is obvious. We assume (IH) that if e+G = recn then r>G = ecn 
and show that if e > G = 7 - ev œ cn’ then 7- ev > G — Te cn’. By the IH, 
this amounts to showing that if e> cn = e- ev>cn’ then ev: e> cn —> e> cn. 
The proof proceeds by cases over the transition rules of the basic active object 
language (cf. Fig. 5). The interesting cases are the rules which need new names. 
Lemma 3 ensures that the predicate fresh(o) will hold for a new name o in ev 
(and similarly for f), and the corresponding rules can be applied. 


It follows from Theorem 1 that if we can replay a run which is equivalent to 
a recorded run 7, the final state of the replayed run will be the same as for the 
recorded run. It remains to show that any run in the equivalence class |r] can 
in fact be replayed. 
Theorem 2 (Progress for replay by local control). Let G,G' be stable 


configurations, T, T’ traces. If e>G = r>G' andr’ € |r], then r >G = e&G’. 


Proof (sketch). We show by induction over the length of trace 7 that if e+G = 


Tæœcn and 7’ € [r], then 7 >G = een’. It then follows from Theorem 1 that 
1 
cn = cn’. 
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5 Extensions for Richer Active Object Languages 


The method for global reproducibility of executions for a basic active object 
language based on record & replay of local traces, may be extended to include 
features introducing other sources of non-determinism in richer active object 
languages [2]. We here briefly review some such features and how the method 
may be extended to cover them. 


Cooperative scheduling. In cooperatively scheduled languages (e.g., [3-5, 14]), 
methods may explicitly release control, allowing other pending method invoca- 
tions be scheduled. The criteria for being rescheduled may be that some boolean 
condition is met, or a future being resolved. Note that methods still execute until 
it cooperatively releases control; i.e., a method will not be interrupted because 
the condition of another method is satisfied. With cooperative scheduling, the 
same task may be scheduled several times, which means that the same schedul- 
ing event may occur multiple times in a trace. In the method for reproducibility, 
this extension can be covered by an additional suspension-event reflecting the 
processor release and an adjustment of the wellformedness condition to reflect 
that a scheduling event either comes after a invocation event (as for the basic 
language) or after a suspension event on the same future. 


Concurrent object groups. In language with concurrent object groups (e.g., [3,5]), 
a group of concurrent objects (or cog) share a common scheduler, which be- 
comes the unit of distribution; this gives an interleaved semantics between ob- 
jects within the same cog, while separate cogs are truly concurrent. For record 
& replay, the events of a trace need to capture the cog, rather than the object, 
in which an event originated. Recording the names of cogs is sufficient for re- 
producibility without controlling the naming of objects. For the reproducibility 
method, the proofs in Section 4 would use an equivalence relation between con- 
figurations that only differ in the choice of object names inside the cogs and the 
global trace set (Def. 4) would project on cogs rather than objects. 


Resource-aware behavior. Active objects may reside in a resource center with 
limited resources, e.g. CPU or memory restrictions, with regards to time (e.g., 
(6, 15]). Statements may have some associated cost which requires available re- 
sources in order to execute. If there are insufficient resources, then execution is 
blocked in that object until time advances. Here, object compete for resources, in 
the same sense that tasks compete for processing time. Following our method for 
deterministic replay, the traces can be extended with events for resource request 
in a similar manner as method invocations in the basic active objects language, 
and resource provision with events similar to the task scheduling events. 


External non-determinism and random numbers. Active object languages may 
also feature external factors that may influence an execution, such as input from a 
user, fetching data from a database or receiving input from a socket, or random 
number generation. Here, a purely ordering-based method is insufficient. Our 
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replay method needs to be extended with events which include the data received 
from the external source and the replay would need to fetch data from the trace 
rather than from the external source, similar to the reuse of object and future 
identifiers from the trace in the previous section. Random number generation 
can be seen as a special case of external non-determinism; for pseudo-random 
number generators it would be sufficient to only record the initial seed for reuse 
during replay. 


6 Implementing Record & Replay for Real-Time ABS 


We report on our implementation! of record & replay, based on the formalization 
in Section 4. The implementation was done for Real-Time ABS [6,16], an ac- 
tive object modeling language which includes the following features discussed in 
Section 5: cooperative scheduling, concurrent object groups, and timed, resource- 
aware behavior, all of which are handled by our implementation. The simulator 
for Real-Time ABS models, written in Erlang, supports interaction with a model 
during execution via the Model API [17] in order to, e.g. fetch the current state 
of an object, advance the simulated clock or visualize the resource consumption 
of a running model. In addition, we have implemented a visualizer for recorded 
traces. In this section, we discuss the following aspects of the implementation: 
the recording of traces in a distributed setting, the handling of names, the vi- 
sualization of traces, and performance characteristics for the implementation of 
record & replay. 


6.1 Recording Traces in a Distributed Setting 


For simulation, ABS models are transpiled to Erlang code by representing most 
entities as Erlang actors, e.g., concurrent object groups (or cogs), resource cen- 
ters, futures and ABS-level processes. Thus, execution is concurrent and may 
be distributed over multiple machines. This leads to two important differences 
from the formalization in Section 4: 


— True concurrency: The formalization is based on an interleaved concurrency 
model, which yields a total order of events. In the simulator, cogs are imple- 
mented as Erlang actors and may operate in true parallel, where two events 
may happen simultaneously, which corresponds to a partial order of events. 

— Distributed state: Because the state of the model is distributed over many 
independent actors, we cannot easily synchronize over the state of different 
actors. In the implementation, such synchronization in the formalization 
must be realized by asynchronous message passing protocols. 


1 The Real-Time ABS simulator is available at 
https://github.com/abstools/abstools 
The accompanying visualization tool is available at 
https://github.com/larstvei/ABS-traces 
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These differences pose challenges for recording and replaying global traces in the 
implementation. When recording a run, it is not trivial to obtain a global trace. If 
all cogs and resource centers were to report their recorded events to a single actor 
maintaining the global trace, races could occur between different asynchronous 
messages. For example, if an object o invokes a method on another object o’, then 
the corresponding invocation and scheduling events could arrive in any order. 
Such races could be resolved by, e.g., introducing additional synchronization or 
using Lamport timestamps [18,19]. Similarly, precisely replaying a global trace 
would require some synchronization protocol with the actor holding the global 
trace, severely increasing the level of synchronization during execution. 

We address these challenges by only considering the local projections of the 
global trace for each cog and resource center. The information needed to con- 
struct local traces does not require any additional synchronization. During re- 
play, only the local execution of an actor is controlled, which is sufficient to 
obtain a run with a trace in the same global trace set. 


6.2 Names in the Erlang Simulator 


The formalization allows recorded names to be reused when replaying a run. 
In contrast, in the Erlang system cogs, resource centers and futures are imple- 
mented as actors (i.e. Erlang actors) and identified by a process identifier (pid) 
determined by Erlang. To ensure that names in the events of the recorded trace 
are easily identifiable in a replay without modifying the naming scheme of Er- 
lang, we construct additional names that are associated with the given pid. The 
constructed names follow a deterministic naming scheme, which guarantees that 
names are globally unique without depending on knowledge of names generated 
in other actors (in contrast to the fresh-predicate in the semantics). 

Cogs, resource centers and futures can be named locally following a naming 
scheme based on existing actors already having such unique, associated names. 
The name (Aja,i + 1) of a new actor can be determined by the actor Ajq in 
which it is created, together with a local counter denoting the number i of 
actors previously created in A;a. Thus, the name of the actor corresponds to its 
place in the topology and is guaranteed to be fresh. 


6.3 Visualization of Recorded Traces 


The trace recorded during a simulation can give the user insight into that exe- 
cution of a model, since it captures the model’s communication structure. The 
recorded trace may be extracted from a running simulation via the Model API 
or written to file on termination. However, the terse format of the traces makes 
it hard for users to quickly get an intuitive idea of what is happening in the 
model. Complementing the replay facility, we have developed a tool to visual- 
ize recorded traces, which conveys information from traces in a more intuitive 
format. To facilitate visualization, the events in our implementation are slightly 
richer than those in Definition 2; e.g., they include the name of the method 
corresponding to the future in the event. 
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The visualization reconstructs a global trace 7 from its local projections. 
Since the local ordering of events is already preserved by the recorded traces, 
we only need to compose local traces in a way that preserves wellformedness. 
We derive a happens-before relation < from wellformedness (Definition 3), and 
denote its transitive closure by <. 

The happens-before relation < gives a partial order of events. In the visu- 
alization of the trace 7, all events are depicted by a colored dot. For any two 
events €1, €2, €1 is drawn above ez if e1 < e2; the events are drawn in the same 
column only if they reside in the same cog or resource center. An arrow is drawn 
between any two events e1, €2 if e} < e2. Events that are independent (i.e., nei- 
ther e1 < e2 nor e2 < e1) may be drawn in the same row. Events with the 
same future as argument are drawn with the same color. The tool additionally 
supports simple navigation in the trace, gives visual indicators of simulated time 
steps, and supports time advancement in a running model through the model 
API, making it easy to step forward in time. Fig. 2 illustrates the visualization 
for two runs of the motivating example. 


6.4 Example 


Consider a Real-Time ABS model of an image rendering service which can pro- 
cess either still photos or video. The service is modeled as a class Service with 
two methods photo_request and video_request. The model captures resource- 
sensitive behavior in terms of cost annotations associated with the execution of 
skip-statements inside the two methods and in terms of deadlines provided to 
each method call. The processing cost for rendering an image is constant (here, 
the cost is given by the field image_cost), but the processing cost of rendering 
a video depends on the number of frames (captured by a parameter n to the 
method video_request). The success of each method call depends on whether 
it succeeded in processing its job, as specified by the cost annotation, before 
its deadline passes; this is captured by the expression in the return statement 
return (Duration(0) < deadline()). Remark that deadline() is a prede- 
fined read-only variable in Real-Time ABS processes. Its value is given by the 
caller. 

In the main block, a server is created on which the service can run. This 
server is a resource-center with limited processing capacity (called a deployment 
component in Real-Time ABS [6]), restricting the amount of computation that 
can happen on the server per time interval in the execution of the model. The 
service is then deployed on the server (by an annotation [DC: server] to new- 
statement. We let a class Client (omitted here) model a given number of pro- 
cessing requests to the image rendering service in terms of asynchronously calling 
the two methods a given number of times (e.g., the call to video_request takes 
the form [Deadline: Duration(10)] f = s!video_request(n), pushing the 
associated futures f to a list, and then counting the number of successful re- 
quests when the corresponding futures have been resolved. It is easy to see that 
the success of calls to the video_request method which requires more resources, 
may depend on whether it is scheduled before or after calls to photo_request, 
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class Service { 
Int image_cost = 1; 


Bool photo_request() { 
[Cost: image_cost] skip; 
return (Duration (0) 

< deadline()); 
} 


Bool video_request(Int n) { 
[Cost: n*image_cost] skip; 
return (Duration (0) 

< deadline()); 


} 
} 
// Main block 
{ 
DC server 
= new DC("Server", 2); 
[DC: server] Service s1 
= new Service(); 
new Client(si, 1, 100); 
} 


Fig. 9: Real-Time ABS code for 
the photo rendering service. 
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Fig. 10: Visualization of a run of the 
photo rendering service. 


depending on the provided deadlines. Thus, the model exhibits both schedul- 
ing non-determinism for asynchronous calls and resource-aware behavior. The 
image in Fig. 10 depicts a trace from a simulation of the model, showing inter- 
actions between a deployment component (left), the service (middle) and the 
client (right). 


6.5 Performance Characteristics of the Implementation 


We give a brief evaluation of the performance characteristics of record & replay 
for Real Time ABS. The size of the traces is proportional to the number of 
objects, method invocations and resource provisions. Because we do not impose 
additional synchronization, we are able to achieve a constant-time overhead. To 
investigate how record & replay scales, we created a micro-benchmark perform- 
ing method invocations on an active object, and recorded execution times for 
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Fig. 11: Record and replay: example (left) and process microbenchmark (right) 


10?,10%,...,10® method calls. We also ran the example of Section 6.4, record- 
ing execution times for 107, 10° and 104 Client iterations. These are worst-case 
scenarios for record & replay, as the invoked methods do not perform any com- 
putation that does not result in creating an event. 

Fig. 11 shows the results of the two programs with replay enabled, with record 
enabled and the last release of Real-Time ABS which does not feature record & 
replay. Note that we only measure simulation time and do not include the time 
reading and writing trace files. We can see that the results of Fig. 11 (left) are 
slightly improved and the overhead observed in Fig. 11 (right) is about a factor 
of 1.8. We note that supporting record & replay in Real-Time ABS required 
extensive modifications to the Real-Time ABS simulators implementation. 


7 Related Work 


This work complements other analysis techniques for Real-Time ABS models, 
such as simulation [17], deductive verification [9], and parallel cost analysis [20] 
and testing [21]. We here discuss related work on deterministic replay. Deter- 
ministic replay is an emerging technique to reproduce executions of computer 
programs in the presence of different non-deterministic factors [1]. It enables 
cyclic debugging [22] in non-deterministic execution environments. Our focus is 
on software-level reproducibility in the context of actor-systems. Approaches to 
reproduce specific runs of non-deterministic systems can be either content-based 
or ordering based [23]. 

Content-based methods trace the values read from a shared memory location. 
These are particularly suitable when there is a lot of external non-determinism 
(typically I/O operations, like user input). Content-based replay for actor sys- 
tems typically record messages, including the sender, receiver and message con- 
tent, (see, e.g., [24-26]). This technique is typically used for rich debuggers like 
Actoverse [24] for Scala’s Akka library, which provides visualization support sim- 
ilar to ours. However, content-based approaches do not scale well [27], because 
the traces can become very large for message-intensive applications. 
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Ordering-based (or control-based) methods trace a system’s control-flow. Our 
work fits within this category. Without external non-determinism, replaying the 
control-flow will reproduce the data of the recorded run. Ordering-based meth- 
ods exist for asynchronous message passing using the message passing interface 
(MPI) [19,28]. MPI assumes that messages from the same source are received in 
order, this does not generally hold for actor systems. Aumayr et al. in [27] study 
ordering-based replay for actor systems with a memory-efficient representation 
of the generated traces. Netzer et al. [29] propose an interesting method to only 
trace events directly related to races, rather than all events (removing up to 99% 
of the events). This line of work is complementary to our focus on formal correct- 
ness and low runtime-overhead during record and replay. We believe we could 
benefit from their work to obtain more efficient trace representations. Lanese 
et al. recently proposed a notion of causal-consistent replay based on reversible 
debugging [30], which enables replay to a state by only replaying its causal de- 
pendencies. Similar to our work, they also formalize record & replay for an actor 
language. In contrast to our work, their approach is based on a centralized actor 
for tracing, and can only be used in combination with a debugger [31]. 


8 Conclusion and Future Work 


This paper has introduced a method for global reproducibility for runs of dis- 
tributed Active Object systems, based on local control. The proposed method 
is order-based and decentralized in that local traces are recorded and replayed 
without incurring any additional synchronization at the global level. The method 
is formalized as an operational semantics for a basic active object language, with 
trace recording and replay. This system exhibits non-determinism through the 
scheduling of asynchronous method calls and synchronization using first-class 
futures. Based on this formalization, we justify in terms of properties of trace 
equivalence classes that local control suffices to reproduce runs with a final state 
which is equivalent to the final state of a recorded run. We then discuss how other 
features of active object languages which introduce additional non-determinism 
can be supported by our method, including cooperative concurrency, concurrent 
object groups and resource-aware behavior. 

The proposed method has been implemented for Real-Time ABS, an Active 
Object modeling language which includes most of the above-mentioned features 
and which has a simulator written in Erlang. The implementation only records 
local ordering information, which allows the overhead of both the record and 
replay phases to be kept low compared to deterministic replay systems which 
reproduce an exact global run. 

In future work, we plan to build on the proposed record & replay tool for 
systematic model exploration, by modifying traces between the record and replay 
phase to explore different runs. This can be done by means of DPOR-algorithms 
for actor-based systems [32-34]. Combining DPOR with our proposed tool for 
record & replay would result in a stateless model checker [35] for Active Object 
systems. 
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Abstract. Modelling is an essential activity in software engineering processes. It 
typically involves two meta-levels: one includes meta-models that describe mod- 
elling languages, and the other contains models built by instantiating those meta- 
models. Multi-level modelling generalizes this approach by allowing models to 
span an arbitrary number of meta-levels. 

A scenario that profits from multi-level modelling is the definition of language 
families that become specialized by successive refinements at subsequent meta- 
levels, hence promoting language reuse. This enables an open set of variability 
options for the possible specializations of a given language. However, multi-level 
modelling lacks the ability to express closed variability regarding the supported 
language primitives and their realizations. This limits the reuse opportunities of 
a language family. To improve this situation, we propose a novel combination of 
product lines with multi-level modelling to cover both open and closed variability. 
Our proposal is backed by a formal theory that guarantees correctness, and is 
implemented atop the METADEPTH multi-level modelling tool. 


Keywords: Meta-modelling, Multi-level modelling, Product lines, Domain-specific 
languages, METADEPTH 


1 Introduction 


Modelling is intrinsic to most engineering disciplines. Within software engineering, it 
plays a pivotal role in model-driven engineering (MDE) [43]. This is a software con- 
struction paradigm where models are actively used to describe, analyse, validate, verify, 
generate code and maintain the application to be built, among other activities. 

Models are built using modelling languages, which can be either general-purpose, 
like the UML [46], or domain-specific languages (DSLs) tailored to a specific con- 
cern [25]. In MDE, the abstract syntax of modelling languages is defined through a 
meta-model that describes the primitives that models can use one meta-level below. 
This modelling approach, which is the standard nowadays, constrains engineers to con- 
fine their models within one meta-level (the “model” level). 

Some researchers have observed that domain modelling can benefit from the use 
of more than one meta-level [6, 14, 17, 19,29]. This way of modelling — called multi- 
level modelling [4] or deep meta-modelling [12] — results in simpler models in scenar- 
ios that involve the type-object pattern [6, 14,30]. Moreover, it permits defining lan- 
guage families (e.g., for process modelling), which can be specialized to specific do- 
mains (e.g., software process modelling, industrial process modelling) via instantiation 
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Fig. 1. (a) Open variability through instantiation. (b) Closed variability through product lines. 


at lower meta-levels [15]. Instantiation is an open variability mechanism that permits 
the language customization by specializing the language primitives for a domain, or 
adding new ones via linguistic extensions [12]. Fig. 1(a) shows a tiny process modelling 
language that defines the primitive TaskType, which is customized by instantiation in 
the lower meta-level for the software process modelling domain (Coding and Design). 
However, multi-level modelling lacks support for expressing optionality of language 
primitives or alternative primitive realizations. This prevents wider language reuse and 
customization possibilities. 

Software product lines (SPLs) encompass methods, tools and techniques to engi- 
neer collections of similar software systems using a common means of production [32, 
35]. SPLs support closed variability, where a concrete software product is obtained 
by selecting among a finite set of available features (i.e., by setting a configuration). 
SPL techniques have been applied to language engineering to define product lines of 
languages representing a close set of predefined language variants [20, 34,47]. As an 
example, Fig. 1(b) shows a process modelling language product line with two config- 
urable features: actors and initial tasks. Selecting a configuration of features (in the 
figure, initial tasks but no actors) yields a language variant. Languages so defined can 
be configured with respect to the primitives they offer and their realization, but cannot 
be specialized for specific domains as this requires from open variability mechanisms. 

To improve current language reuse techniques, we propose combining multi-level 

modelling and product lines. This allows the definition of highly configurable language 
families that profit from both open variability (as given by instantiation) and closed 
variability (as given by configuration). This way, this paper makes the following con- 
tributions: (i) a novel notion of multi-level model product line; (ii) a theory that guar- 
antees the correctness of (certain) interleavings of instantiation and configuration steps; 
and (iii) an implementation of these ideas on top of the METADEPTH tool [12]. 
Paper organization. Section 2 introduces multi-level modelling and identifies the chal- 
lenges tackled in this paper. Section 3 provides a light formalization of multi-level mod- 
elling, which is extended with product line techniques in Section 4. Section 5 describes 
tool support. Section 6 discusses related research, and Section 7 ends with the conclu- 
sions and future work. An appendix includes the proofs of the theorems in the paper. 


2 Multi-level modelling: intuition and challenges 


In this section, we introduce the main concepts of multi-level modelling by example 
(Section 2.1), and then discuss the challenges that we aim to tackle (Section 2.2). 
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Fig. 2. Commerce example using (a) standard modelling and (b) multi-level modelling. 


2.1 Multi-level modelling, by example 


Multi-level modelling permits the definition of models using multiple meta-levels [6, 
14]. To understand its rationale, assume we would like to create a language to define 
commerce information systems (a standard example often used in the multi-level mod- 
elling literature [6, 14]). This language should allow defining product types (like books 
or food) which have a tax, as well as products of the defined types (like Othello or ba- 
nana) which have a price. Moreover, some product types may need to define specific 
properties, like the number of pages in books. 

Fig. 2(a) shows a solution using two meta-levels. In this solution, the meta-model 
of the language uses the type-object pattern [30] to emulate the typing relation between 
Product and ProductType. In addition, classes Attribute and Slot permit defining prop- 
erties in ProductTypes and assigning them a value in Products (called dynamic features 
pattern in [14]). The model in the bottom meta-level represents an information system 
for Kiosks, and defines the product types Book and Food. The model also defines the 
products sold by a particular kiosk: the Othello book and Bananas. 

On reflection, one can realize that this solution emulates two meta-levels within one, 
as we convey with the dashed line in Fig. 2(a). Therefore, Fig. 2(b) shows an alterna- 
tive multi-level solution using three meta-levels. The top level defines just ProductType, 
which is instantiated at the next level to create Book and Food product types, which 
in turn are instantiated at the bottom level to create specific products. Hence, elements 
in this approach are called clabjects [2] (from the contraction of the words class and 
object), as they are types for the elements in the level below, and instances of the ele- 
ments in the level above (see for instance Book). 

The multi-level solution leads to a simpler model (with fewer elements) as it re- 
quires just a clabject to represent both ProductType and Product. However, one needs 
to control the properties of instances beyond the next meta-level. In the example, we 
need to control that the direct instances of ProductType have a tax, and the instances of 
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its instances have a price. For this purpose, we use a deep characterization mechanism 
called potency [2,4]. This is a natural number, or zero, which governs the instantiation 
depth of elements. Fig. 2(b) depicts the potency after the “@” symbol, and the elements 
that do not declare potency take the potency from their container (e.g., attribute price 
takes its potency from ProductType, and this from the Commerce model). When an el- 
ement is instantiated, the instance gets the potency of the element minus 1. Elements 
with potency 0 are pure instances and cannot be instantiated. This way, attribute Pro- 
ductType.tax is instantiated into Book.tax and Food.tax, which therefore have potency 0 
and can receive values. As model Commerce has potency 2, it can be instantiated at the 
two subsequent meta-levels. The potency of a model is often called its level [6]. 

Sometimes, it is not possible to foresee every possible property required by clab- 
ject instances several meta-levels below, like the number of pages in books. To handle 
those cases, multi-level modelling supports linguistic extensions. These are clabjects 
or features with no ontological type, but with a linguistic type which corresponds to 
the meta-modelling primitive used to create it (see Orthogonal Classification Architec- 
ture in [5] for more details). As an example, Book.numPages is a linguistic extension 
modelling a property specific to Book but not to other product types. Instead, in the two- 
level solution in Fig. 2(a), the properties of specific ProductTypes need to be explicitly 
modelled by classes Attribute and Slot, leading to more complexity. 


2.2 Improving reuse in multi-level modelling: some challenges 


Multi-level modelling enables language reuse by supporting the definition of language 
families. For example, Fig. 3 shows at the top a generic process modelling language 
that can be used to define process modelling languages for different domains, like ed- 
ucation, software engineering, or production engineering. The language is designed to 
consider three levels. Level 2 contains the language definition, consisting of primitives 
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Fig. 4. Examples of variability needs: (a) optional attributes, (b) optional primitives, (c) alterna- 
tive primitive realizations. 


to define task and gateway types. Level | contains language specializations for specific 
domains. The figure shows the case for the software engineering domain, which defines 
the task types Requirements and Design, and two gateway types: ReqDep to transition 
from requirement tasks to either design or requirement tasks, and DesignDep to declare 
dependencies between design tasks. Finally, level 0 contains domain-specific processes. 
The one in the figure declares three tasks and one gateway. 

This example shows how instantiation permits customizing the language primitives 
offered at the top level for particular domains, and how linguistic extensions (e.g., at- 
tribute Design.style at level 1 in Fig. 3) allow adding domain-specific primitives to lan- 
guage specializations. However, the following scenarios require further facilities that 
enable a better fit for particular domains and increase language reuse. 


— Alternative realizations. A language primitive may be realised in different ways, 
each more adequate than the others depending on the domain. For example, in 
Fig. 3, dependencies between task types are modelled by GatewayType. However, 
in domains that do not require distinguishing gateway types, a simpler representa- 
tion of dependencies as a reference between TaskTypes is enough (see Fig. 4(c)). 
Unfortunately, multi-level modelling does not support this kind of variability. 

— Primitive excess. Some offered language primitives may be unnecessary in simple 
domains. This can be controlled by not instantiating the primitive, but still, with- 
drawing the needless primitives to simplify the language usage may be a better 
option. Moreover, there are problematic situations. First, if the primitive is an at- 
tribute (like initial in Fig. 4(a)), then it becomes instantiated by force, polluting the 
model with unnecessary information. Second, some mandatory primitives may not 
be needed in certain domains. For example, in Fig. 4(b), the language designer as- 
sumes that any TaskType (e.g., Requirements) will be performed by one ActorKind 
(e.g., Analyst or DomainExpert). However, there may be domains that do not in- 
volve actors (e.g., if tasks are automated), but the mandatory relation perfBy forces 
having instances of ActorKind associated to instances of TaskType. 

— Deferred variability resolution and exploratory modelling. The decision about 
the inclusion or not of a primitive may not be clear when the language is instanti- 
ated for a domain, but this is determined later at lower meta-levels. For example, 
in Fig. 4(a), an engineer might hesitate whether, in addition to the expected task 
duration (attribute duration), s/he may want to store the real task duration (attribute 
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rDuration with potency 2), in which case, s/he may prefer deferring the decision to 
levels 1 or 0. In general, resolving all variability in a language family at the top 
level may be hasty in some cases, as the suitability of a primitive may become 
evident only when a language has reached certain specificity (i.e., at lower meta- 
levels). Moreover, enabling modelling before resolving the variability may be good 
for exploratory purposes. 


To tackle these challenges, we incorporate variability into multi-level models taking 
ideas from SPLs. As a first step, next we formalize multi-level models. 


3 A formal foundation for multi-level modelling 


We start defining the structure of models equipped with deep characterization, which 
we call deep models. We represent models at different meta-levels in a uniform way, in 
order to cope with an arbitrary number of meta-levels. For simplicity of presentation, 
we omit inheritance, cardinalities and integrity constraints in our formalization. 


Def. 1 (Deep model) A deep model is a tuple M = (p, C, S, R, src, tar, pot), where: 


— p € No is called the model potency, or level. 

— C, S and R are disjoint sets of clabjects, slots and references, respectively. 

- src: SUR > Cis a function assigning slots and references to clabjects. 

- tar: R > C is a function assigning the target clabject to references. 

- pot: CU SU R > No is a function assigning a potency to each element, s.t.: 
1. Vec CU SU Repot(e) <p 
2. Vs € SU Re pot(s) < pot(sre(s)) 
3. Vr € R e pot(r) < pot(tar(r)) 


In the previous definition, we assign a level p to deep models. Elements in a deep 
model have a potency via function pot, which must satisfy three conditions: (1) the 
potency of an element should not be larger than the model level, (2) the potency of slots 
and references should not be larger than the one of their container clabject, and (3) the 
potency of references should not be larger than the one of the clabjects they point to. 

Next, we define a general notion of mapping (a morphism) between deep models 
as a tuple of three (total) functions between the sets of clabjects, slots and references. 
Each morphism has a depth (an integer or 0) controlling the distance between the levels 
of the involved models. We use two particular types of mappings to represent the type 
relation between deep models at adjacent meta-levels (when the morphism depth is 1), 
and extensions of a deep model to add linguistic extensions (when the depth is 0). 


Def. 2 (D-morphism, type and extension) Given two deep models M; = (pi, Ci, Si, Ri, 
srci, tari, poti) for i = {0,1}, a deep model morphism (D-morphism in short) m = 
(d,mco,ms,mr)}: Mo > M, is a tuple made of a number d € No called depth, and 
three functions mo: Co > Ci, mg: So > Sı and mpg: Ro > Ry s.t.: 


l. ppo+d= pı 
2. Ve € Xo o poto(e) + d = poti(mx(e)) (for X = {C,S, R}) 
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Gi EN GF Ci ee O, 
AD te sey Cae ease 
| sTco | | sTco | taro | 
So ——> Co Co <—— Ro ——> Co 


Fig. 5. Commutativity conditions for D-morphisms. 
3. Each function mc, ms, MR commutes with functions src; and tar; (see Fig. 5) 


D-morphism tp = (d,tpc, tps, tpr): Mo > My; is called type if d = 1, and is called 
indirect type if d > 1. M; is called the (indirect) model type of Mo. 

D-morphism ex = (d, exc, eXgs,e€&R): Mo + M; is called level-preserving if d = 0. 
A level-preserving D-morphism ex is called extension if each ex x (for X = {C,S, R}) 
is an inclusion. An extension is called identity if each ex x is surjective. 


In the previous definition, condition 1 ensures that the D-morphism connects models 
of suitable levels, condition 2 checks that the potency decreases according to the depth 
of the D-morphism, and condition 3 ensures that the D-morphism is coherent with the 
source and target of slots and references (just like in standard graph morphisms [16]). 
We use total functions to represent the type, which ensures that each element in a deep 
model has a type. Linguistic extensions are not typed, but they are modelled as an exten- 
sion D-morphism of a (typed) deep model into a larger model. This avoids resorting to 
partial functions to represent the type, which would complicate the formalization [38]. 
Identity extensions map isomorphic deep models. D-morphisms can be composed by 
composing the three mappings and adding their depths. 

A multi-level model is made of a root deep model, and a sequence of instantia- 
tions and extensions. The length of this sequence is equal to the root model level. The 
extensions are allowed to be identity extensions. 


Def. 3 (Multi-level model) A multi-level model MLM = (Mj,ML = ((Mj pian 


Mag CE M;j.,.1))i=0..p,—1) is made of a deep model Mg called the root and a se- 


quence ML (of length po, the level of M$) of spans of D-morphisms, where the left 
D-morphism is a type and the right D-morphism a (possibly identity) extension. 


Example. Fig. 6 shows a multi-level model (an excerpt of the one in Fig. 3) according 
to Def. 3. Slots are represented as rounded nodes, instead of inside the owner clabject 
box. In Fig. 3, we do not show slots with potency bigger than 0 that are typed, like 
Design.duration at level 1, which is omitted. However, such instances do exist, and are 
explicitly shown in Fig. 6 (see slot duration’@1 in models M; and M’,). If a model 
does not include linguistic extensions (like M2), then we use the identity extension D- 
morphism. Finally, it would be possible to derive the (indirect) type of M2 w.r.t. M'o by 
defining a construction akin to a pullback that yields the part of M2 typed by M: [28]. 


4 Miulti-level model product lines 


In order to solve the challenges identified in Section 2.2, we extend deep models with 
closed variability options by borrowing concepts from product lines. We use feature 
models [24] to represent the allowed variability. 
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TaskType@2 
duration@2 


F = { ProcessLanguage, i 
Gateways, actors, Tasks, H 
simple, object, H 
initial, enactment}, 

® = ProcessLanguage ^ 

; E Gateways ^ Tasks A 

1 H ((~simple ^ object) v 

rd (simple A — object)) ) 


mandatory optional alternative or 
L (exactly one) (atleast one) 


Fig. 7. Feature model for the example. (a) Feature diagram notation. (b) Using Def. 4. 


Def. 4 (Feature model) A feature model FM = (F,®) is a tuple made of a set F of 
features and a propositional formula © specifying the valid feature configurations. 


Example. Fig. 7 shows the feature model for the running example using both the fea- 
ture diagram notation (a), and our definition (b). The feature model permits choosing if 
the process modelling language will have primitives to define actors (feature actors, cf. 
Fig. 4(b)), initial tasks and their enactment at level 0 (features initial and enactment, cf. 
Fig. 4(a)), as well as selecting whether gateways are to be represented either as refer- 
ences or objects (features simple and object, cf. Fig. 4(c)). The feature model includes 
the mandatory features ProcessLanguage, Gateways and Tasks as syntactic sugar to 
obtain a tree representation, but they are not needed in our formalization. 


The selection of one option within the variability space offered by a feature model 
is done through a configuration. This assigns true to the selected features, and false to 
the discarded ones. To enhance flexibility of use, we also support partial configurations, 
where some features are not given any value. This will be used to allow deferring the 
resolution of some variability options to lower meta-levels. 
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Def. 5 (Configuration) Given a feature model FM = (F,©), a configuration of FM 
is a tuple C = (F*,F7) made of two disjoint sets F* C F and F~ C F, s.t. 
®|F* /true, F—/ false] Z false. C is total if F = Ft U F-, otherwise it is partial. 


In the previous definition, F* contains the selected features (i.e., given the value 
true), F— the discarded features (i.e., given the value false), and F \ (F* U F7) is 
the set of features whose value has not been set. A configuration must be compatible 
with the feature model formula, so the definition demands that the formula @ once we 
substitute F+ by true and F` by false is not false. If the configuration is total, then the 
condition entails that must evaluate to true. 

Next, we assign a level to feature models, and potencies to features, in order to 
restrict the level at which features can be assigned a value. 


Def. 6 (Deep feature model) A deep feature model DFM = (l, FM = (F,®), pot) 
is made of a level | € No, a feature model F M, and a function pot: F — No assigning 
a potency to each feature, s.t.Vf E F è pot(f)< L 


Next, we define a mapping between deep feature models, called F-morphism. Sim- 
ilar to D-morphisms (cf. Def. 2), F-morphisms have a depth which can be positive or 0. 
In addition, they include a configuration, and a mapping for the features excluded from 
the configuration. There are two special kinds of F-morphisms: one representing a type 
relationship between feature models (where the morphism depth is | and the configura- 
tion empty), and the other expressing a specialization relationship between two feature 
models via a total or partial configuration (where the morphism depth is 0). 


Def. 7 (F-morphism, type and specialization) Given two deep feature models DF M; = 
(li, FM;, poti) (for i = {0, 1}), a deep feature model morphism (F-morphism in short) 
m = (d,mp, C): DFM > DFM, is made of: 


— a depth d € No s.t. lo +d = li 
- an injective set morphismmp: Fo > F; s.t. Yf € Fo è poto(f)+d = poti(mr(f)) 
- a configuration C = (Fy, F7 } of FM, s.t.: 

L. mr(Fo) = Fy \ (FY U Fy) 

2. & [Fi /true, FT / false] = &o[Fo/mr(Fo)| 


F-morphism tp is a type morphism if d = 1 and C = (0,9), and it is an indirect type 
morphism if d > 1 and C = (0,0). F-morphism sp is a specialization if d = 0. 


The definition requires that the F-morphism depth fills the gap between the feature 
model levels, and between the potencies of the mapped features. F My may have fewer 
features than F Mj, in case the configuration C assigns a value to features of F M1. In 
particular, the injectivity condition of mp and requiring mp(Fo) = Fi \ (FF U FT) 
ensures that only the features left undefined by C are mapped from F Mo. Moreover, 
when the configuration C assigns a value to some feature, we require that the formula 
®ı, once we substitute the features in C by their value true or false, be equivalent to 
@o, once we substitute the features in Fo by their mapping in F. This corresponds to a 
(partial) evaluation of the formula @; as a result of a feature model specialization. 

As a remark, F-morphisms so defined are composable by adding their depths and 
making the union of the positive (resp. negative) features in the configurations. 
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gt Core 
d=0 


C=( Ft={object}, 
F-={simple}) 


Í DEM, = (2, ! | DEM, =(1, ! | DFM, = (1, 
i F,={Gwys, simple, object}, ! i F,={Gwys, simple, object}, ! | Fo={Gwys}, 
i ®, = Gwys ^ ! i P, = Gwys ^ ! ! ®, = Gwys, 
i ((Asimpleaobject) Vv: i ((—simpleľnobject) v ! | poto={(Gwys,1)}) 
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H (simple,1), | i (simple,0), 
(object,1)}) | i (object,0)} ) 
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Fig. 8. Examples of F-morphisms. 


Example. Fig. 8 shows two F-morphisms, with tp a type and sp a specialization. F- 
morphism tp: FM —> F Mpg relates two deep feature models FM, and F M2, where 
the level and potencies of FM; are one less than those in F Mə, and the formulae 
are the same modulo feature renaming. Specialization sp: FMo — FM, has depth 0 
and partial configuration C = (Ft = {object}, F7 = {simple}). Hence, the levels 
and potencies are maintained, but the feature set Fo is decreased by removing from Fi 
the features that appear in C. According to condition 1 in Def. 7, {Gwys} = {Gwys, 
simple, object} \ ({simple} U {object}). According to condition 2 in the definition, the 
formula o is equivalent to replacing object by true and simple by false in Pı. If we 
compose sp with tp, the resulting F-morphism tp o sp has depth 1 and configuration 
C = (F* = {object}, F7 = {simple}), which is neither a type nor a specialization. 


Finally, we are ready to characterize deep model product lines (PLs) as a deep 
model, a deep feature model with the same level as the deep model, and a mapping 
of presence conditions (PCs) to deep model elements. 


Def. 8 (Deep model PL) A deep model PL DM = (M, DFM, 6) is made of: 


— A deep model M and a deep feature model DF M with the same level (p = l). 
— A function 6: CU S U R > B(F) mapping each element in M to a (non-false) 
propositional formula over the features in F, called presence condition (PC), s.t.: 
1V¥sEeSUR è G(s) = (src(s)) 
2. rE R è d(r) = d(tar(r)) 
3. Ve € CU SU R, Vu € Var(¢(e)) © pot(v) < pot(e) 


Intuitively, given a configuration, we can derive a product (a deep model) of the 
PL by deleting the model elements whose PC evaluates to false. To avoid dangling 
references and slots, Def. 8 requires their PC not to be weaker than that of their owning 
clabject (condition 1), and the PC of references not to be weaker than the one of their 
target clabject (condition 2). In addition, the variability of an element must be resolved 
in a level that contains the element. To this aim, condition 3 ensures that the potency of 
the variables in the PC of an element is not higher than the element’s potency (we use 
function Var to return all variables within a propositional formula). 
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initial@2 : boolean [initial] |* [object] Gateway 
duration :int x [object] Type 
rDuration : int [enactment] [object] 


Fig. 9. Deep model PL example. 


Example. Fig. 9 shows a deep model PL for process modelling languages. The left 
compartment shows the deep feature model, and the one to the right the deep model 
with its elements annotated with their PC between square brackets. If an element does 
not show a PC (like TaskType), then its PC is true. The deep model PL permits select- 
ing between two alternative realizations for gateways, either as the reference next or 
the clabject GatewayType. This variability needs to be resolved before instantiating the 
language for a specific domain, as features simple and object have potency 0. The PL 
also offers the choice to add or not the primitive ActorKind to the language, but this de- 
cision can be taken before specializing the language or at level 1 to enable exploratory 
modelling. Finally, the PL allows selecting whether tasks can be initial and whether they 
hold enactment information. Feature initial in the feature model cannot have potency 
2 because the feature is used in the PC of attribute TaskType. initial, which has potency 
1. The feature model shows features ProcessLanguage, Gateways and Tasks in colour 
and without a potency; this is so as these features are mandatory (1.e., their value is true 
in any valid configuration), and while they enable a hierarchical representation of the 
feature model, the formalization of the example does not include them. 


Next, we introduce mappings between deep model PLs (called PL-morphisms) as a 
tuple of morphisms between their constituent deep models and deep feature models. As 
in the previous cases, we are interested in type morphisms, linguistic extensions, and 
specializations of deep model PLs via a (partial) configuration. 


Def. 9 (PL-morphism, type, extension, specialization) Given two deep model PLs 
DM; = (Mi, DF Mi, ¢:) (for i = {0,1}), a PL-morphism m = (m?,m*’) is made 
of a D-morphism m? : Mo —> M; and an F-morphism m* : DF My —> DFM, with 
configuration C = (F+, F7), s.t. Ve € Co U So U Ro © oi(e) [Fy /true, Fy / false] 
= $o(c)[Fo/mE (Fo). 

PL-morphism tp = (tp? , tp") is a type if both tp? and tp" are types. 

PL-morphism ex = (ex? ,id*) is an extension if ex? is an extension and id" is an 
identity. 

PL-morphism sp = (m? , sp") is a specialization if sp" is a specialization and m? 
is injective, level-preserving, and the elements e € C1 U S1 U R, s.t. 61(e) [Fy /true, 
F7 / false] & false are in its co-domain. 


Remark. No condition on the equality of depths of m? and m” is required, since the 
levels of Mo and DF Mo are the same (and similar for the levels of MM, and DF M1). 
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Fig. 10. Examples of PL-morphisms and deferred configuration. 


The condition for PL-morphisms demands that the PCs in the deep model Mo are mod- 
ified according to the selection of features in configuration C of m”. In addition, in 
specialization PL-morphisms, Mo should contain just the elements whose PC is not 
false after substituting the features in F+ by true, and the ones in F~ by false. There- 
fore, in case of a specialization, the definition requires that, when the configuration C is 
considered, exactly the elements in Mı whose PC is not false receive a mapping from 
Mo, while the mapping needs to be injective. Moreover, by Def. 8 of deep model PL, 
no element in Mp can have a PC that is false. 


Other kinds of PL-morphisms are possible, for example, adding features to a fea- 
ture model in lower meta-levels to increase its variability. While this is an interesting 
possibility to increase language reuse, we leave its formalization to future work. 


Example. Fig. 10 shows four valid PL-morphisms (tp, tp’, sp, sp’) and an invalid one 
(ex). Both tp and tp’ are types: they relate models at adjacent levels, where one is an 
instance of the other. Types always use the empty configuration C = (Ø, ØY (cf. Def. 7), 
and therefore, a model element and its instances have the same PC (see, e.g., ActorKind 
and its instance SoftEng). Both sp and sp’ are specialization PL-morphisms. This is 
so as they preserve level and potencies, and the deep models only contain elements 
with non-false PC. As the configuration C of both PL-morphisms is total, the PC of 
the elements in DM3 and D Mp evaluates to true, and hence, these models do not have 
more closed variability options to configure (i.e., they are final products of the PL). 
The figure also shows an attempt to extend DM; by a linguistic extension made of the 
clabject Skill connected to SoftEng through reference exp. However, the result is not a 
valid deep model PL as the PC of SoftEng (actors) is stronger than the PC of exp (true). 
This could be solved by adding actors as PC of exp (and Skill). 


When the configuration C of a specialization PL-morphism sp is total, DMp is a 
product of DM, with no variability, being equivalent to a deep model (cf. Def. 1). How- 
ever, the question remains whether for any valid configuration C of a deep model PL 
DM, we can find a deep model PL DM” and a specialization PL-morphism sp: DM’ > 
DM that uses C. This requires showing that any choice of F+ and F~ results in a valid 
deep model PL DM’ as given by Def. 8. Theorem 1 captures this result. 
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Theorem 1 (Derivation through specialization morphisms). Given a deep model PL 
DM = (M, DFM, 6) and any configuration C of DF M, there is one deep model PL 
DM! and a specialization morphism sp: DM' — DM with configuration C. 


Proof. In appendix. 


Next, we look into the soundness of deferring the configuration of an element after 
it is instantiated. The question is whether, in any situation that allows configuring an 
element after its instantiation, we obtain the same result by resolving the element vari- 
ability first and then instantiating. This result is important as, regardless of the order 
in which configurations and instantiation are performed, we can calculate the language 
that results of applying the configurations as the first step, by advancing the configura- 
tion steps over the instantiations. 

The next theorem captures the fact that if we can instantiate and then configure, then 
we obtain the same result if we configure and then instantiate. 


Theorem 2 (Specialization can be advanced to instantiation). Given three deep model 
PLs DM; = (M;, DF Mi, ¢:) (for i = {0,1, 2}), a type PL-morphism tp: DM, > 
DMb and a specialization PL-morphism sp: DM, — DM,, there is a unique deep 
model PL DM3, a unique type PL-morphism tp’: DMz — DMs and a unique spe- 
cialization PL-morphism sp’: DM3 — DMb s.t. the diagram in Fig. 11 commutes. 


DMo a sp’ DM3 
i A 
ii — tp 

DM, <srp— DM3 


Fig. 11. Deferred configuration: specialization can be advanced to instantiation. 


Proof. In appendix. 


Remark. Note that the converse is not true in general, that is, instantiation cannot be 
advanced to specialization. The reason is that a type morphism is not allowed from 
features with potency 0, meaning that they must be configured first. 


Example. Fig. 10 shows a deferred configuration. Deep model PL DMp is instantiated 
into DM4, and then configured using C = (Ft = {}, F~ = {actors}) to yield DM3. 
Instead, we obtain the same result by first configuring DMp to yield DMs, and then 
instantiating DMs into DM2. Deep model PL DMs is relevant as it corresponds to the 
fully-configured language (i.e., with no variability) employed to build DM3. 


5 Tool support 


We have implemented the notions presented so far atop METADEPTH [12]. This is a tex- 
tual multi-level modelling tool which supports an arbitrary number of meta-levels and 
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1 @vVariability(model="ProcessOptions”) l FeatureModel ProcessOptions@2 { 
2 Model ProcessModel@2 { 2 ProcessLanguage : Gateways Tasks actors?@1; 
3 Node TaskType { 3 alt Gateways : simple@0 object@0; 
4 @Presence(condition="initial”) 4 Tasks : initial?@1 enactment?@2; 
5 initial@1 : boolean = false; 5 } 
6 duration : int; ae 
: Listing 2. Deep feature model. 
7 @Presence(condition="enactment”) £ P 
8 rDuration : int; 
9 @Presence(condition="simple”) 1 config ProcessModel with { !simple } 
10 next : TaskType; re ; 
Listing 3. Feature configuration. 
11 @Presence(condition="actors”) 8 g 


12 perfBy : ActorKind; 
_ 
15 @Presence(condition="actors”) 

16 Node ActorKind; val: boolean[0..1] 


18 @Presence(condition="object”) «conformsito» «conformsito» 


19 Node GatewayType { ProcessModel@2 PCAnnotations@O0| ProcessOptions@0 


20 src : TaskType[*]; 


2) 


23 } z 
Listing 1. Deep model with PCs. Fig. 12. Internal representation of deep model PL. 


deep characterization through potency. It integrates the Epsilon family of languages for 
model management [33], which permits defining code generators and model transfor- 
mations for multi-level models. 

METADEPTH was used to define language families via multi-level modelling in [15], 
but it did not support the definition of closed sets of variability options by means of 
PLs. For this work, we have extended the tool to allow creating deep feature models 
and multi-level models with PCs, and specializing deep model PLs via configurations. 
The extended tool is available at http://metadepth.org/pls. 


Listing 1 specifies the deep model in the right part of Fig. 9, using METADEPTH’s 
syntax. First, line | states the name of the deep feature model (defined in Listing 2) as- 
sociated to the deep model. Then, line 2 declares the deep model, named ProcessModel, 
with level 2. This contains three clabjects: TaskType (lines 3-13), ActorKind (lines 15— 
16) and GatewayType (lines 18—22). PCs are specified as annotations. This is possible 
as, similar to Java [10], METADEPTH permits defining annotation types by providing 
their syntax, parameters, and kind of elements they can annotate (i.e., models, clabjects 
or fields) [40]. This definition is a meta-model, and so, when annotations are parsed, 
they are transformed into an annotation model that refers to the annotated model. Re- 
garding the PC of fields, for usability reasons, our implementation internally conjoins 
the PC of fields with the PC of their owner clabject. For example, the PC of reference 
GatewayType.src is object because the PC of GatewayType is object. 


Listing 2 shows the METADEPTH definition of the deep feature model in Fig. 9. 
This conforms to a meta-model that we have created to represent deep feature models, 
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and to which we have assigned a concrete syntax similar to the FAMILIAR tool [1]. 
Line 1 declares a feature model called ProcessOptions with level 2. Line 2 declares the 
root feature ProcessLanguage, and its children features Gateways, Tasks and actors. 
Children features can specify a potency after the “@” symbol, and be declared optional 
using the “?” symbol. Line 3 declares the children of Gateways, which are alternative as 
specified by the keyword alt. Line 4 declares the children of Tasks, which are optional. 

Fig. 12 shows the internal representation of a deep model PL in METADEPTH. The 
PC annotations are automatically converted into an annotation model, which is also 
linked to the deep feature model (ProcessOptions). 

Annotations in METADEPTH can attach actions to be triggered upon certain mod- 
elling events, like instantiation or value assignment. These actions are defined via a 
meta-object protocol (MOP) [26, 40]. This way, we have defined a MOP with actions 
for the PC annotations, to help instantiating deep model PLs. Specifically, when an ele- 
ment of a model with variability is instantiated (like ProcessModel in Listing 1), its PC 
is copied to the instance. Moreover, a constraint forbids instantiating a deep model PL 
if the associated deep feature model has features with potency 0. 

Finally, we have created a command called config to specialize a deep model PL 
via a configuration (see Listing 3). When the command is applied, the PCs attached to 
model elements are evaluated (partially if the configuration is partial), and then removed 
if their value is false. The applied configuration (i.e., the boolean values assigned to the 
features) is stored in the deep feature model itself (cf. model ProcessOptions in Fig. 12). 
Overall, this simple example language already admits 16 total configurations, which can 
be succinctly represented as a PL, increasing its reuse possibilities. 


6 Related work 


Next, we review related research coming from language PLs; variability in multi-level 
modelling; and SPLs. 


Language PLs. Some researchers have proposed increasing the reusability of mod- 
elling languages by incorporating SPL techniques. For example, in [47], DSL meta- 
models can be configured using a feature model. In [34], the authors propose featured 
model types: meta-models whose elements have PCs, and with operations that are of- 
fered depending of the chosen variant. In [20], meta-models can have variability, and 
their instantiability is analysed at the PL level. However, all these works only consider 
closed variability, while our work also supports open variability through instantiation. 


Variability in multi-level modelling. A plethora of multi-level modelling approaches 
and tools have emerged recently, like DeepTelos [22], FMMLx [18], Melanee [3], Mul- 
tEcore [29], MLT [17] and OMLM [21]. Some of them are based on deep character- 
ization through potency [3, 18,21,29], while others rely on powertypes [17] or most- 
general instances [22]. None of them support variability based on feature models as we 
describe here. However, there have been some attempts to improve multi-level mod- 
elling with SPL techniques, which we describe next. 

Reinhartz-Berger and collaborators [37] present a preliminary proposal to support 
the configuration of classes with optional attributes. It is based on a kernel language 
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which supports multiple meta-levels but not deep characterization. The proposal is in- 
cipient as it is neither formalized nor implemented. In [9], the authors analyse the limi- 
tations of feature models alone to describe a set of assets, and propose using multi-level 
models instead. As multi-level models have limitations to express variability — as de- 
scribed in Section 2.2 — we propose to combine feature models and multi-level models. 

Nesic and collaborators [31] explore the use of MLT [17] to reverse engineer sets of 
related legacy assets into PLs. MLT is a multi-level modelling approach based on pow- 
ertypes and first order logic. In their work, the authors represent variability concepts like 
PCs and product groups within MLT models. This embedding may result in complex 
models where elements can represent either variability concepts or domain concepts. 
Instead, we separate PCs and feature models to avoid cluttering the multi-level model. 
Our goal is to define highly reusable language families, for which we provide feature 
models to describe variability options, and offer the possibility to defer configurations; 
instead, the approach in [31] lacks an explicit representation of feature models. Finally, 
we provide both a theory and a working implementation. 

Other formalizations of potency-based multi-level modelling exist, like [38]. That 
theory does not account for variability, but it could be extended with feature models, in 
a similar way as we do. 


SPLs. Our deferred configurations can be seen as a particular case of staged con- 
figurations [11]. These permit selecting a member of the PL in stages, where each 
stage removes some choices. In our approach, the potency controls the level where 
the variability can be resolved. Staged configurations are also useful in software design 
reuse. In this setting, Kienzle and collaborators [27] propose Concern-Oriented Reuse, 
a paradigm where reusable modules (called concerns) define variability interfaces as 
feature models. The variability of a reused concern can be resolved partially, in which 
case, the undefined features are re-exposed in the interface of the resulting concern. 
We also support deferring the variability resolution, but composing deep model PLs is 
future work. 

Taentzer and collaborators [45] formalized model-based SPLs using category the- 
ory. Different from ours, their formalization does not capture typing (it is within a single 
meta-level), while their morphisms can expand the feature model but cannot be used to 
model partial configurations. Borba and collaborators [8] have studied PL refinements 
to add new products maintaining the behaviour of existing ones. In our case, we do 
not increase variability, but it would be interesting to consider mechanisms to do so 
combined with instantiation. 

To cope with large variability spaces, partitioning techniques can be applied to fea- 
ture models to yield so-called multi-level feature models [11,36]. However, the term 
multi-level does not refer to multiple levels of classification (as in our case), but to 
multiple partitions of a feature model. 

Other modelling notations support variability. For example, Clafer [23] is an ap- 
proach that unifies feature and class modelling. It supports both class and (partial) ob- 
ject models, feature models, (partial) configurations and logic constraints. However, 
it does not support multi-level modelling or deep characterization. Similar to delta- 
oriented programming [42], A-modelling [41] permits defining a set of products as a 
core model plus a set of modification deltas to the core model according to given ap- 
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plication conditions. The approach has been combined with MDE, showing that model 
configuration and refinement (e.g., a component being refined by a set of classes) com- 
mute. This is in line with our Theorem 2, but we are interested in instantiation (instead 
of refinement), and need to incorporate potency for deep characterization. Hence, in our 
case, instantiation and specialization (configuration) do not commute, but the latter can 
be advanced to former. 

In the programming world, Batory [7,44] proposes mixin layers, a composition 
mechanism to add features to sets of base classes (so called two-level designs). Higher- 
level designs can be obtained by applying the same techniques. In [7], these higher-level 
designs are called multi-level models. Again, the use of the term multi-level is different 
from ours, which refers to models related by classification relations. 


Overall, our proposal is the first one adding variability to multi-level models with 
support for deep characterization. 


7 Conclusions and future work 


In this paper, we have proposed a new notion of multi-level model PL to improve current 
reuse techniques for modelling languages. This is so as it permits both open variability 
(by successive instantiations leading to language refinements for specific domains), and 
closed variability (by selecting among a set of variants). We have presented a theory, 
with results ensuring the proper interleave of instantiation and configuration steps. The 
ideas have implemented on top of the multi-level modelling tool METADEPTH. 

In the future, we plan to provide a categorical formalization of the theory which 
brings operations like intersection via common parts (pullbacks) and merging (pushouts) 
of deep model PLs. We also want to offer the possibility of extending a deep model PL 
with new features (i.e., extra variability) and move this variability to the top model 
whenever possible. We would like to develop analysis techniques for multi-level model 
PLs, e.g., to check instantiability properties in the line of [20]. Finally, our goal is 
to make multi-level model PLs ready for MDE. This would entail the ability to de- 
fine MDE services like transformations and code generators on multi-level model PLs. 
Technically, our plan is to use the Epsilon languages supported by METADEPTH, and 
follow ideas from existing works on PLs of transformations [13], and transformation of 
PLs [39]. 
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Appendix 


Proof of Theorem 1: Given a deep model PL DM and a configuration C = (Ft, F7), 
we build DM’ = (M’', DFM’, ¢’) as follows: 


— M' has the same level as M, and contains the elements e of M s.t. 6(e)[F'* /true, 
F~/ false] Z false. Functions src’, tar’ and pot’ are restrictions of src, tar and 
pot to the elements in M’. 
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- DFM = (l, FM' = (F’,®’), pot’), where F’ = F\(F*UF—), ® = @[Ft /true, 
F= / false], and pot’ is the restriction of pot to F”. 

- Ve € C'US'UR' è @'(e) = d(e)[F* /true, F7 / false]. 
Now we show that M” is a valid deep model according to Def. 1: 


— To check that src’ is well formed, we show that Vs € S’UR’, src(s’) is defined. By 
condition 1 in Def. 8, (s) => ¢(src'(s)). This precludes the source of any s € 
S’ U R’ to be absent from C”, since if 6(src’(s))[F* /true, F7 / false] = false, 
then ¢!(s)[F*/true, F7 / false] = false. 

— The well-formedness of tar’ is shown like in the previous case. 

— Function pot’ satisfies conditions 1-3 of Def. 1, since pot satisfies them, and pot’ 
is just a restriction of pot. 


Now we show that DM’ is a valid deep model PL according to Def. 8: 


— M’ and DFM’ have the same level (l). 
— The three conditions over ¢’ and pot’ hold, since they hold for ¢ and pot. 


Finally, we build a specialization PL-morphism sp = (m™, sp"): DM’ > DM 
as follows: 


inc’ 
-m™ = (0,inc,incM ,incM), where X) —5 X (for X = {C,S, R} are 
inclusion set morphisms, 
inc? 
— sp“ = (0,inc”,C), where F’ ——+ F is an inclusion morphism. 
We need to show that: (i) mp (F') = F’ = F\(FTUF7 ), which holds since F’ was 


defined above as F \ (F*U F7); and (ii) [Ft /true, F7 / false] = P' |F" /incr(F’)], 
which holds since &’ was defined above as [Ft /true, F7 / false}. 


Proof of Theorem 2: Let C = (F+, F`) be the configuration of the specialization PL- 
morphism sp: DM, —> DM,. From DMo and C, we construct a deep model DM3 
and a specialization PL-morphism sp’: DM3 — DMp as described in the proof of 
Theorem 1. Then, we build a type PL-morphism tp’ = (tp’?, tp"): DM2 —> DM; as 
follows: 


- tp? = (1, tp2\c,, tp? |s,,tpR|r,), with tp? |x, the restriction of tp? to set X2 
in DMg (for X = {C,S, R}). 
- tp = (1, tp} |r,, C) with tp |r, the restriction of tp% to set F3. 


D-morphism tp’” is well defined because Yc € C2, dc’ € C3 s.t. tpp (spë (c)) = 
sp® (c'). This is so as ġı(spë (e)) [Ft /true, F7 /false] # false due to Def. 9 of 
specialization PL-morphism. And now, since the configuration of tp is empty, we have 
bo(tp® (spë (e)) [Ft /true, F7 / false] # false. This means that, according to Def. 9, 
this element is in the co-domain of spe, and is assigned to c by ip? The same rea- 
soning applies to sets S2 and F>. Function tpt, | F, is also well formed, since the same 
configuration C was used to derive DM2 and DM3. 

This reasoning also shows that tp o sp = sp’ o tp’, as Theorem 2 demands. 
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Abstract. The goal of probabilistic static analysis is to quantify the 
probability that a given program satisfies/violates a required property 
(assertion). In this work, we use a static analysis by abstract interpretation 
and model counting to construct probabilistic analysis of deterministic 
programs with uncertain input data, which can be used for estimating 
the probabilities of assertions (program reliability). 

In particular, we automatically infer necessary preconditions in order a 
given assertion to be satisfied/violated at run-time using a combination of 
forward and backward static analyses. The focus is on numeric properties 
of variables and numeric abstract domains, such as polyhedra. The ob- 
tained preconditions in the form of linear constraints are then analyzed to 
quantify how likely is an input to satisfy them. Model counting techniques 
are employed to count the number of solutions that satisfy given linear 
constraints. These counts are then used to assess the probability that the 
target assertion is satisfied/violated. We also present how to extend our 
approach to analyze non-deterministic programs by inferring sufficient 
preconditions. We built a prototype implementation and evaluate it on 
several interesting examples. 


1 Introduction 


Program verification is often concerned by only determining whether one assertion 
always holds at a given program point. However, there are many applications 
where we need to know a more fine-grained information about how likely a target 
assertion (event) is to be satisfied/violated. Examples of other target events 
include the invocation of a certain method, the access to confidential information, 
etc. In those cases, we want to distinguish between what is possible event (even 
with extremely low probability) and what is likely event (possible with higher 
probability). In this work, we show how to calculate the reliability of programs 
by using combination of static analysis by abstract interpretation and model 
counting. In particular, we are interested to learn how the presence of uncertainty 
in the inputs can affect the probability of assertions at the exit of the program. 
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This is an important problem to consider, since uncertainty is a common aspect of 
many real-world software systems today (e.g., medicine and aerospatial domains). 


Abstract interpretation [6,7,8] is a general theory for approximating the 
semantics of programs. It provides safe (all answers are correct) and efficient 
(with a good trade-off between precision and cost) static analyses of run-time 
properties of real programs. It is based on the idea of approximations between 
concrete and abstract domains of program properties. Its practical success is 
mainly enabled by the design of numerical abstract domains, which reason on 
numerical properties of variables. For example, the interval domain [6], which 
is non-relational, infers the information about the possible values of individual 
variables; the octagon domain [25], which is weakly relational, infers unit binary 
linear constraints between program variables; and the polyhedra domain [10], 
which is fully relational, infers the linear constraints between all program variables. 
Abstract interpretation is a powerful technique for deriving approximate, albeit 
computable analyses, by using fully automatic algorithms. These abstract analyses 
pay the price for finite computability (always terminate) by an inevitable loss of 
precision. We use abstract analyses for automatic inference of (over-approximated) 
invariants by forward analysis, and (over-approximated) necessary preconditions 
by backward analysis. These two abstract analyses can be combined such that the 
results of the first analysis refine the results of the second one. In this work, we use 
a combination of forward and backward analyses to automatically generate the 
necessary preconditions on input variables that lead to the satisfaction/violation 
of a given assertion. If obtained preconditions are satisfied by some concrete values 
for input variables, then they represent input values that will allow the given 
assertion to be definitely satisfied/violated by all program executions branching 
from them. In fact, we run two backward analyses: the first one determines 
necessary preconditions for the given assertion to be satisfied, while the second 
one determines necessary preconditions for the given assertion to be violated. 


Model counting is the problem of determining the number of solutions of 
a given constraint (formula). The LATTE tool [1] implements state-of-the-art 
algorithms for computing volumes, both real and integral, of convex polytopes as 
well as integrating functions over those polytopes. More specifically, we use the 
LATTE tool to estimate algorithmically the exact number of points of a bounded 
(possibly very large) discrete domain that satisfy given linear constraints. 


In this paper, we describe a method which uses abstract interpretation-based 
static analysis and model counting to perform a specific type of quantitative 
analysis of deterministic programs, that is the calculation of program reliability. 
Calculating the program reliability involves counting the number of solutions to 
preconditions, which are given in the form of linear constraints between variables, 
i.e. elements from the polyhedra domain, that ensure satisfaction/violation of a 
given assertion by using model counting, and dividing it by the total space of 
values of the inputs. We assume that the input values are uniformly distributed 
within their finite discrete domain. Since the set of generated preconditions 
represents an over-approximation, we compute the reliability of programs as 
upper and lower bounds of exact probabilities that a given assertion is satisfied 
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or violated. The reported uncertainty is due to the approximation inherent in 
abstract interpretation, which is introduced in order to obtain a scalable and 
fully automatic analysis. 

The focus here is on programs whose input values range over finite discrete 
domains. Thus, we obtain a finite input domain and so we can use model counting 
algorithms to compute the required probabilities. We also restrict ourselves to 
the domain of linear integer arithmetic, since this is supported by LATTE and 
the polyhedra numeric domain we use. 

We also consider an extension of our approach to non-deterministic programs. 
For non-deterministic programs, sufficient and necessary preconditions no more 
coincide [26]. Sufficient preconditions ensure that the target invariant holds for 
all sequences of non-deterministic choices made at each execution step, whereas 
necessary preconditions ensure that the target invariant holds for at least one 
sequence of non-deterministic choices made at each execution step. In effect, 
increasing the non-determinism will reduce the set of sufficient preconditions and 
enlarge the set of necessary preconditions. Hence, for non-deterministic programs 
we construct backward analyses for inferring (under-approximated) sufficient 
preconditions that lead to the satisfaction/violation of a given assertion. The 
calculation of reliability is then similar to the one for deterministic programs. 

We have developed a prototype probabilistic static analyzer which uses the 
APRON library [21] to implement numeric property domains and the LATTE 
tool [1] to implement model counting algorithms. APRON provides a common 
high-level API to the most common numerical property domains, such as intervals, 
octagons, and polyhedra. We have implemented a combination of forward and 
backward analyses of deterministic (resp., non-deterministic) C programs for the 
automatic inference of invariants and necessary (resp. sufficient) preconditions 
in all program points. Our static analyzer has two components: (1) it computes 
the required preconditions in the input program point for a given assertion to be 
satisfied/violated, and (2) it then calls LATTE to count the number of solutions 
of those preconditions and calculates the program reliability. 

The main contributions of this work are: 


— We demonstrate how to calculate the program reliability of deterministic and 
non-deterministic programs using static analysis by abstract interpretation 
and model counting. 

— We develop a probabilistic static analyzer, which uses numerical property 
domains from the APRON library and the LATTE model counting tool. 

— Finally, we evaluate our method for probabilistic static analysis of C programs 
and show how to handle a set of small but compelling benchmarks. 


2 Motivating Examples 


Consider the program Pı in Fig. 1. Suppose that the initial value of i ranges over 
the integer domain [0,19], and the initial value is independently and uniformly 
distributed across this range. When (i > 10) the variable k is assigned to 12, 
otherwise k is assigned to 50. A forward invariant analysis will find the invariant 
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void main() { 
O: int j:=[0,9]; linpae i 


void main() { ©: int i:=0; 
®©: int i:=[0,19]; linpue : (): while (i < 100){ 
©: int k:=0; O: i:=i+1; 
©: if (i > 10)k:=12; else k:=50; ©: j:=j+1;} 
leinal © assert (k < 30); linai © assert (j < 105); 
} } 
Fig. 1: The program Pı Fig. 2: The program Pz 


k = 12 V k = 50 at point lfina1. Therefore, the assertion (k < 30) can be satisfied 
(when k = 12) and can be violated (when k = 50). We are interested in inferring 
necessary preconditions on the input state at control point linput, when the 
assertion is satisfied and when the assertion is violated. We back-propagate 
necessary conditions of satisfaction and violation of the assertion from point [¢inai 
to linput- A backward necessary condition analysis will infer the precondition 
i > 10 at point linput assuming that the assertion is satisfied, and the precondition 
i < 10 at point linpus assuming that the assertion is violated. The size of the 
input domain is 20, since i € [0,19]. By calling LATTE to count the number of 
solutions to the above preconditions, we can calculate that the probability for the 


assertion to be satisfied (success probability) is: 5? = 50%, and the probability 
for the assertion to be violated (failure probability) is: 38 = 50%. 


Consider the program P in Fig. 2. A forward invariant analysis will find the 
invariant 100 < j < 109 at point ltina1, So the corresponding assertion can be 
satisfied (when 100 < j < 105) and can be violated (when 105 < j < 109). A 
backward necessary condition analysis will infer the precondition 0 < j <5 at 
point linput for the assertion to be satisfied, and the precondition 5 < j < 9 at 
point linput for the assertion to be violated. Therefore, we can calculate that the 


success probability is: 5 = 60%, and the failure probability is: 5 = 40%. 


3 Forward-Backward Precondition Analyses 


We describe the combination of forward and backward analyses in the framework 
of abstract interpretation for inferring necessary preconditions that a given 
assertion is satisfied/violated. The principle of the combination is to use the 
result of the forward invariant analysis in the subsequent backward necessary 
condition analysis in order to get more precise results which are still sound. 


Syntax. We consider a simple deterministic programming language that is a 
subset of C, which will be used to exemplify our work. The control point (location) 
before each statement and at the end of each block is associated to a unique label 
l € L. The syntax of the language is given by: 


s :=skip|x:=e|x:=[n,n’]| s;s | if (e) then s else s | while (e) do s|assert(e) 


ex=n|x | e@e 


186 A. S. Dimovski and A. Legay 


where n ranges over integers, |n, n’] ranges over integer intervals, x ranges over 
variable names Var, and © over arithmetic-logic operators. Non-deterministic 
interval assignment x:=[n,n’] represents an input statement which assigns to the 
input variable x a uniformly distributed random value from the interval [n,n]. 
This interval assignment can occur only in the input section of the program, 
and is used to model input uncertainties. The set of all generated statements s 
is denoted by Stm, whereas the set of all expressions e is denoted by Exp. We 
assume linput is the location after the input statements (i.e. it denotes the end of 
the input section) and ltina1 is the location at the end of the program, where an 
assertion assert(e¢) is posed. Without loss of generality, a program is a sequence 
of statements followed by a single assertion. 


Concrete semantics. A program state is given by a control location in L and 
an environment in E : Var + Z mapping each variable to its value (integer 
number). We write X = L x E to denote the set of all possible program states. 
Programs are modelled as transition systems (X, —>), where X is a set of states 
and —>C X x X is a transition relation modelling atomic execution steps. The 
relation —> is defined by local rules, such as the following: 


assignment lo : x:=e;l, 2: (lo,p) — (h,p[x > [e](~)]), where [e](~) € Z is 
the result of the evaluation of e in the environment p, and p[x +> n] denotes 
the environment that updates p at variable x with the value n. 

input lo: x:=|n, nw]; l 3: (lo, p) — (li, p[x = n”]), where n” € [n,n]. 

conditional lo : if (e) then {l$ : s; É} else {12 : s; Ify; lı :: (lo, p) — (Ub, p) 
if fello) # 0%, (lo,p) — (t$, p) if Tello) = 0, (0) — (h,o), and 
(I, p) — (hp). 

loop lo : while (e) do {lf : s; K}; lı = (lo, p) — (É, p) if [e](e) 4 0, (lo, p) — 
(l, p) if [el(p) = 0, and (I, p) — (lo, p). 4 


Let E C E be the set of input environments obtained after executing the 
input statements. The set of input states is Z = {(linput,p) | p € E}. The 
invariant inference (reachability) problem consists of finding out the possible 
environments (values of all variables) that may arise at each control location. 
The concrete semantic domain is the complete lattice of the powerset of states 
(P(X), C,U,N,0, X), and the concrete semantics in the form of invariant states 
encountered branching from Z, denoted inv(Z), is: 


inv(Z) = 1fp7åX.X U post(X) 


where post(X) = {0 € X | do’ € X.0' — o} and 1fpzf is the least fixed point 
of the function f greater than Z. 

In this work, we consider the problem of inferring necessary preconditions. 
Assume that a program exits with ltina1 : assert(es). We want to distinguish 


3 Following the convention popularized by C, we model Boolean values as integers, 
with zero interpreted as false and everything else as true. 
4 Note that control moves from the final label lt of s to the initial label Jo of while. 
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between program termination that leads to the satisfaction of the final assertion 
at lfina1 from the one that leads to the violation of the final assertion at lfina1- 
Let Feat = {(1, p) € inv(Z) | l= ltini = > [ez] (0) 4 0} and Fuion = {(1, p) € 
inv(Z) | l = ltini = > [es] (0) = 0} be the invariant sets which enforce the 
assertion at the point [sina to be satisfied and violated, respectively, and coincide 
with inv(Z) everywhere else. In the following, F may represent either Fsat or 
Fvioi. Given an invariant set F to obey, we want to infer the set of input states 
cond(F) that guarantee that all program executions stay in F: 


cond(F) = gfp,AX.X N pre(X) 


where pre(X) = {0 € X | do’ € X.a — o'} is the set of predecessors of X, and 
gfp+f is the greatest fixed point of the function f smaller than F. The above 
two fixed points (1fp and gfp) exist according to Tarski, as the corresponding 
functions are monotone and continuous in the complete lattice of state sets. 


Given a set of input environments E C E, we can compute the subsets Esat 
and E,;.1 of input environments that lead to satisfaction and violation of the 
final assertion as: 


Usat = aa) {p | (linput; p) E cond(Fsat)}, Gaol = uM {p | (linput p) = cond(Fyio1)} 


Abstract semantics. Transition systems can become large or infinite for real 
programs, so that neither inv(Z) nor cond(F) can be computed at all. Therefore, 
we seek for sound approximations. The actual computable abstract analyses can 
be defined as over-approximations of the concrete semantics. A static analyzer will 
infer over-approximated necessary preconditions so that all program executions 
that lead to satisfaction (resp., violation) of the final assertion are taken into 
account, thus computing an over-approximation of Esat (resp., Eviot). 

We consider an abstract domain (D, Ep), such that there exist a Galois 


connection ° (P(E), C) = (D, Ep). We assume that the abstract domain 
D 


D is equipped with sound operators for ordering Ep, least upper bound (join) 
Lp, greatest lower bound (meet) Mp, bottom Lp, top Tp, widening Vp, and 


narrowing Ap, as well as sound transfer functions for assignments assignp : 
Var x Exp x D > D, tests filterp : Exp x D > D, and backward assignments 
b-assignp : Var x Exp x D x D > D. We let 1fp* (resp., gfp*) denote an 
abstract post-fixpoint (resp., pre-fixpoint) operator, derived using widening Vp 
and narrowing Ap, that over-approximates the concrete lfp (resp., gfp) [8]. 
Finally, the concrete domain on which concrete semantics is defined (P(X), C) is 
abstracted using a Galois connection (P(X), C) = (L > D, È) where a(R) = 
Al € L. Up {d E D | (l, p) € R,ap(p) = d}. Hence, each control point l € L is 
associated with an element d € D in the abstract semantics. 


5 (L, <1) i (M,<m) is a Galois connection between complete lattices L and M 
iff a and y are total functions that satisfy: a(l) <a m <— > l <z y(m) for all 
le L,m € M. Here <z and <m are the pre-order relations for L and M, respectively. 
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We define a family of forward transfer functions a v : D — D that compute 
the effect of any concrete transition at the abstract level. The definition of aa v 
for some statements is: 
assignment lo : x:=e;1, :: Tuai = assīgin(x e, d)). 
conditional lọ : if (e) then {l$ : s; $} else hes al (d) = 

filterp(e,d), ©, (d) = filtetp(-e,d), Fu (d) = d, Oy ,,(@) = 
loop lo : while (e) do {I§: s; É}; :: ae - pee d), 7 ae = 

filterp(>e, d), and 5 x 1(d) = d. 


The soundness of ‘eer | 1,l’ € L} is written as: Vd € D,Vp € yp(d), (l, p) — 
Up) = o € (d). 

Suppose that the abstract element ap(E) = dinput € D is at the input control 
point linput- We can collect the abstractions of possible environments at each 
program control point using the following forward interpreter: 


F# = DALE L). Up (Oy a(I(V)) |U €L} 


such that the result of the forward analyzer is in = l1fpř F#, where Jo(linput) = 
-> 
dinpue. Assume that 1fpË F# (lena) = deinar Let data, = filtetp(e, deiner) and 


final 
CO filterp(—e, dtinai). We want to design two backward abstract interpreters 
that propagate backwards the invariants ensuring that the final assertion is 
satisfied d3%4,, and violated d¥i0,, respectively. The backward interpreters refine 
the invariants found by F’*. Thus, they take two elements of D as inputs: an 
invariant to refine and an invariant to propagate backwards. They are based 
on a family of backward transfer functions ôy : D x D —> D, which map 
a precondition to refine and a postcondition into a refined precondition. The 


definition of ô; for some statements is: 


assignment Ip: x:=e;l) 3: Tura d') = b-assignn(x, e, d, d’). 

conditional lo : if (e) then {If : s; I} else {lf : s’; Wh; = 6 1o15 (d, d') = dnd’, 
= = = 

ó p (dd)=dnd, Fun (d,d') =ar d, Trp (d,d) =an d. 

loop lo : while (e) do {l$ : s; É}; = 6 loi (d, d') = dd’, 6 io, (d, d') = dnd’, 
and Sy (d,d) = dnd’. 


The soundness of { T iw | LU € L} is Written as: Vd,d' € D,Vp E€ wld), p’ € 


wld), (l oA — (UV, p') => p © wil T iv(d,d’)). That is, d is refined into a 
stronger precondition by taking into eae the postcondition d. 
Suppose that F$% pica) = = dee Bee Nena) = dviol and F$” (0) = 


ai final? 


paN) = #00) for l Æ ltina1. The backward interpreters are defined as: 


F# — AU, FAU E L). p {5r AO, FO) |V €L} 
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< < 
such that the results of the two backward analyzers are: C”, = ete, psat) F# 
BS 


viol 


& + 
and CË = Bipi prviot F#. The necessary preconditions that the final asser- 
2D 


tion is satisfied and violated are diffu, = C tA Giana) and diners =C t (n) 
respectively. We can now compute the over-approximated sets Efa, and E%. |, of 


input environments Esat and E,;.; that lead to satisfaction and violation of the 
final assertion as: 


raat = EN Yoldia) Efo = EN w(dineue) 


usat 2 Esat and viol 2 viol: 


such that 


Polyhedra numeric abstract domain. Although, the abstract domain D can be 
instantiated with different property domains, in the following, we will use the 
polyhedra numerical abstract domain for D. This is due to the fact that only for 
the polyhedra domain all necessary abstract operations and transfer functions, 
such as assignp, b-assignp, b-assign-underp (see Section 5), are implemented 
in the APRON library. The Polyhedra domain [10], denoted as (P, Cp), is a fully 
relational numerical property domain, which allows manipulating conjunctions 
of linear inequalities of the form a,x; + ...+ QnXn > 8, where x1, ..., Xn are 
program variables and a;, 3 € R (reals). The abstract operations of the Polyhedra 
domain are defined in [10]. Polyhedra analysis is expensive but also very precise. 

A property element is represented as a conjunction of linear constraints given 
in the matrix form (A,b) that consists of a matrix A € R™”*” and a vector 
b € R™, where n is the number of variables and m is the number of constraints. 
This is called the constraint representation of polyhedra elements, and there is 
another so-called generator representation. One representation can be converted 
to the other one using the Chernikova’s algorithm [5]. Some domain operations 
can be performed more efficiently using the generator representation only, others 
based on the constraint representation, and some making use of both. We now 
present some operations that can be defined using the constraint representation. 

The concretization function is: yp((A,b)) = {v € R” | A -v > b}. The meet 
Mp is defined as: (A1, b1) Np (A2, be) = (A, (p=). We also need widening 


Ao)? \b2 
since the polyhedra domain has infinite strictly increasing chains. 


(A1, b1) V p (A2, b2) = {c € (Ai, bi) | (A2, b2) Ep {c}} 


where c represents one constraint from (Aj, bi). The transfer function filter p 
abstracts affine inequality expressions by adding them to the input polyhedra. 


ETS OO D VA 


Example 1. Consider the program P, from Fig. 1. Assume that D is the polyhedra 
domain. The input abstract element is dinpur = (0 < i < 19). Using the forward 
analyzer Fe, we obtain deinai = (k = 12 V k = 50), and so digt; = (k = 12) and 
dgl] = (k = 50). Using backward analyzers TH, we obtain dat, = (i > 10) 
anda = (i < 10). 


input 
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4 Computing Success and Failure Probabilities 


The overall goal of our approach is to answer questions about the probability 
of assertions at the exit of a deterministic program P. We define the success 
probability as the probability that a program terminates successfully with the 
target assertion being satisfied. The failure probability is the probability that a 
program hits a failure caused by the target assertion being violated. 

The combination of forward and two backward analyses infers the necessary 
preconditions, denoted Cenk =C T (lisp) and quio =C * (Gipi), that the 
target assertion is satisfied and violated, respectively. Calculating the likelihood of 
satisfying/violating the given assertion involves counting the number of solutions 
to diftat do and dividing it by the total space of possible values in its input 
domain E. In particular, we use model counting techniques and LATTE tool [1] to 
estimate algorithmically the exact number of points of a bounded (possibly very 
large) discrete domain E that satisfy the (linear) constraints diggus and driol 
We restrict our attention on programs that have finite input domains E and on 
numeric abstract elements from the polyhedra domain expressed as linear integer 
arithmetic (LIA) constraints over program variables whose values are uniformly 
distributed over their input domain. 

We use the LATTE tool to compute the number of elements of E that satisfy 
didt and dyiol,, denoted #(diat,,) and #(dyiel,). The size of E, denoted #(E), 
is the product of domain’s sizes of all input variables in program P. Thus, we have: 
#(E) = [],:=(nnyep In’ -n+ 1|. Note that the exact sets of input states that lead 
to satisfaction and violation of the given assertion are Esat and E,,.;, and their 
sizes are denoted #(Esat) and #(Evio1). Since the found necessary preconditions 
das and rou are over-approximations of Esat and E,;., respectively, we have 


#(Esat) < #(d3%,,) and #(Eyior) < #(d2t2l,). Moreover, the input environments 


input input 


which are not in yp(df2t +), that is they are in E\yp(di%1,.), definitely lead to 


input input 


the violation of the assertion. Therefore, we have #(E) — #(df%t u) < #(Evio1) < 


input 


~ 


#(diol+). By similar reasoning as above, we can also establish that: #(E) — 
(dl) < #(Esat) < #(d$%t.,). Finally, we calculate the success and failure 


input input 
probability of a program P as follows: 


#(E)—#(dS) © peep) — (Ean) < HEE) 
ot pn an (P)= “Fe S Jen, (1) 
H —# Hae f — #(Ervioi) input 
Fey SPr(P) = “gay S ea 


Note that Pr°(P) + Prf(P) =1. 


Example 2. Consider the program P, from Fig. 1. We have E = {[i + n] | 
n € [0,19]}, and so #(E) = 20. Using forward and two backward analyses, we 
obtain di, = (i > 10) and dv. = (i < 10), and so #(d@..) = 10 and 


: input input input 
# (dire...) = 10. Thus, the success and failure probabilities are: 
10 10 
Pr°(P,) = — (50%), and Pr/(P,) = — (50%) 


20 20 
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We use model counting and the LATTE tool [1] to determine the number of 
solutions of a given constraint. LATTE accepts LIA constraints expressed as a 
system of linear inequalities each of which defines a hyperplane encoded as the 
matrix inequality: Ax < B, where A is an m x n matrix of coefficients and B is an 
m x 1 column vector of constants. Most LIA constraints can easily be converted 
into the form: a,%1 +... + ann < b. For example, > and > can be flipped 
by multiplying both sides by —1, and strict inequalities < can be converted by 
decrementing the constant b. In LATTE, equalities = can be expressed directly. If 
we have disequalities 4, they can be handled by counting a set of constraints that 
encode all possible solutions. For example, the constraint aA (xı Æ £2) is handled 
by finding the sum of solutions for aA (xı < a#2—1) and aA (z1 > 24241). Fora 
system Ax < B, where A is an m x n matrix and B is an m x 1 column vector, 
the input LATTE file is: 

m n+l 
B —A 


where the first line indicates the matrix size: the number of inequalities m by 
the number of variables n plus one. The following lines encode all inequalities. 


5 Extension to non-deterministic programs 


Let us reconsider the program P» from Fig. 2, where the assignment in point 
© is now replaced with: j:=j+[0,1]. That is, the variable j is incremented by 
a uniformly distributed random integer between 0 and 1 at each iteration. We 
denote this non-deterministic program as P; (taken from [26]), given below: 


void main() { 

©: int j:=[0,9]; linput : 
@: int i:=0; 

©: while (i < 100) { 
@: i:=i+1; 

@: j:=j+[0,1];} 
ltinaı : assert (j < 105); } 


A forward invariant analysis will find that at ltina1 holds: 0 < j < 109, and so 
the assertion (j < 105) can be both satisfied and violated. A backward necessary 
condition analysis for assertion satisfaction will infer the precondition 0 < j < 9 
at linput, Since for any value j € [0,9] there exists an program execution satisfying 
the assertion (e.g., consider the executions where the random integer from [0, 1] 
always evaluates to 0 in the body of while). However, a backward sufficient 
condition analysis for assertion satisfaction computes the set of input states such 
that all program executions branching from them satisfy the assertion. In this 
case, the sufficient condition analysis will infer the precondition 0 < j < 5 at 
linput, Since even if the random integer from [0,1] always evaluates to 1 in the 
body of while, the assertion will always hold. As a result of this, we can conclude 
that the success probability is greater or equal to: É = 60%. 
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We can see that necessary and sufficient preconditions are different in the 
presence of non-determinism [26]. Note that, if the non-determinism is increased 
in a program, then the set of sufficient preconditions will be reduced, while the set 
of necessary preconditions will be enlarged. For non-deterministic programs, Esat 
and E,;.1 are subsets of input environments E that definitely lead to satisfaction 
and violation of the final assertion for all possible non-deterministic choices, 
respectively. We define the success probability Pr*(P) as the probability that 
a program terminates successfully with the target assertion being satisfied for 
all possible non-deterministic choices taken at each step. The failure probability 
Prf(P) is the probability that a program hits a failure caused by the target 
assertion being violated for all possible non-deterministic choices taken at each 
step. We now show how to compute the success and failure probabilities for 
non-deterministic programs using sufficient conditions. 


Remark. Note that in case of deterministic programs, Esat and Evio form a 
partition of the set of input environments E (Esat U Eyio. = E), thus we have 
Pr*(P) + Prf(P) = 1 for any deterministic program P. However, for non- 
deterministic programs this is not true anymore. That is, Esat U Eyjo, C E and 
Pr’(P) + Prf (P) < 1 for any non-deterministic program P. This means that 
there exist input environments for which it is possible the target assertion to be 
both satisfied and violated depending on non-deterministic choices made at each 
step of the given execution. For example, in the above program P3, for input 
environments that satisfy 6 < j < 9, the target assertion is satisfied (when [0, 1] 
in the body of while always evaluates to 0) and violated (when [0,1] in the 
body of while always evaluates to 1), so those input environments are neither 
in Esat nor in Eyio.. We have, Esat = {p | 0 < [j]o < 5} and E,i. = Ø, thus 
Pr*(P3) = 60% and Prf(P3) = 0%. 


Syntax. The extended non-deterministic programming language includes the 
same expression and statement productions as previously (see Section 3), but we 
add a support for non-deterministic expressions by using integer intervals |n, n’]: 


6s= xc. | [n,n] 


The integer interval |n, n’] denotes a uniformly distributed random integer from 
the interval [n, n’] (non-deterministic choice of an integer). Note that the interval 
assignment x:=[n,n’] can now be freely used everywhere in programs, not only 
in the input section as in deterministic programs. 


Concrete semantics. We now consider the problem of backward sufficient con- 
dition inference. Given an invariant set F to obey, we want to infer the set of 
input states cond(F) that guarantee that all program executions branching from 
them for all possible non-deterministic choices taken at each step stay in F: 


cond(F) = gfp,-AX.X N pre(X) 


where pre(X) = {0 € X | Yo’ € Via — o’ => o' € X} is the set of states 
which represent predecessors only of states in X. Note that the function pre(X) 
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differs from the function pre(X) used in Section 3, that is pre(X) 4 pre(X), if 
the transition system is non-deterministic (i.e. some states have several successors 
or none). Using pre(X) ensures that the invariant set F holds for all sequences of 
non-deterministic choices made at each execution step, while pre(X) ensures that 
the invariant set F holds for at least one sequence of non-deterministic choices. 
Note that pre(X) = pre(X) for deterministic programs, since |post({a})| = 1 
for every state ø € X in this case. 


Abstract semantics. In order to compute an under-approximating set of sufficient 
preconditions, we require an abstract domain D with the following backward 
abstract operators: meet Nder, backward assignment b-assign-underp : Var x 
Exp x D x D — D, backward tests b-filter-underp : Exp x D x D > D, 
and a lower widening Vp [26]. The above abstract operators represent a sound 
under-approximation of the corresponding concrete operators. We let gfp#™"4eF 
denote an abstract pre-fixpoint operator, derived using lower widening Vp, that 
under-approximates the concrete gfp. 

We design two backward sufficient condition abstract interpreters that propa- 
gate backwards the invariants ensuring that the final assertion is satisfied di% 


final 


and violated d¥%e!,, respectively. They are based on a family of backward transfer 


functions ô} : D x D > D, which for some statements are defined as: 


— assignment lo : x:=e;lı : ‘Funder (¢, d’) = b-assign-underp(x, e, d, d’) 

— if statement lo : if (e) then {lt : s; É} else {lf : s; f}; h : Tati d') = 
b-filter-undern(c,d,d'), 5, (d, d') = b-filter-underp(-e, d, d'), and 
Sua (dd) =dnd’, Fy), (d,d) = Adna 

— lo : while (e) do {lh : s; lhsh : Trala d') = b-filter-underp(e,d, da’), 
Tin (d,d’) = b-Filter-underp(-e,d,d’), and Jy. ),(d,d’) = dN d. 


— 
The soundness of { ô 777°" | 1,1’ € L} is written as: Yd, d' € D, Yp € (d), p' € 
<< 
wd’), p E w(i (d,d)) = (lp) — Up’). 
The backward sufficient condition interpreters are defined as: 
con 
Funder _ AU, F)AU € L). ns {6 Par" (U, FU) |U EL} 


‘G #under 


such that results of backward analyzers are: C #0" — gfp#under {F maer 


(T+, Fg) 
FH #under funder T Hunder i ee P 
and C = gf F . The sufficient preconditions that the final 


viol Pie Friol) 
D 
— y 
a n : n t di d l d 
assertion is satisfied and violated are dinpat or on *"(linput ) and diapit = 
S Hund F : 
On *"(linput ), respectively. We can now compute the under-approximated sets 


1 #eund vund : i k . 
ry and Ev" of input environments Esat and E,;o that definitely lead to 


satisfaction and violation of the final assertion as: 


sffunder _ mp sat, under s#funder _ mp viol under 
“sat — NYD (dinpat ), “viol — LNY ( dinput ) 


#under 7 #under 7 
such that E% C Esat and Efo C Eviol- 
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Computing success and failure probabilities. As before, we instantiate D with 
the polyhedra numeric abstract domain, since all under-approximating sound 
backward operators for it have been implemented in the APRON library [26]. The 


5 oye $ d iol di : : + 
sufficient preconditions dinpat and dit are under-approximations of Esat 


i ; tund i iol,und ‘ 
and Eyioi respectively, so #(dinpus  ) < #(Esat) and #(dinpe  ) < #(Eviot).- 
Moreover, the input environments which are not in mldi > that is they are 


in Nm ldi may lead to the violation of the assertion. Therefore, we have 
vada < #(Evio) < #(E) — ra cae oma By similar reasoning, we can 
also establish that: # (dimu) < #(Esat) < #(E) — # (dikei t). We calculate 
the success and failure probability of a program P as follows: 


iS 


sat,under viol,under 
#(dinput ) < Pr*(P) _ a a ree ) 2) 
pailme) ; HE) — HE) #(azaeen) 

pay <Pr'(P) = “yey < FE) 


Example 3. Consider the program P3 from the beginning of this section. Using 
two backward sufficient condition analyses, we obtain dite.” = (0 < j < 5) 


and djiohmder = (Lp), and so #(dgaywn"") = 6 and #(dyicur"**") = 0. Thus, the 


input input input 
success and failure probabilities are: 


(60%) Š < Pr*(P3) <1(100%) and (0%)0< Prf (P) < 


6 Implementation 


We now describe the implementation and evaluation of the ideas presented so 
far. The evaluation aims to show the following objectives: 


O1: The probabilistic analysis can be used to analyze the behaviour of various 
interesting programs; 

O2: The probabilistic analysis gives exact results (with no precision loss) in many 
cases, especially for deterministic programs; 

O3: The performance time of probabilistic analysis is largely insensitive to domain 
sizes of input variables; 

O4: We can find practical application scenarios of using our probabilistic analysis 
to efficiently analyze C programs. 


Implementation. We have implemented a prototype probabilistic static analyzer 
that accepts programs written in a subset of C. It does not support struct and 
union types, and provides only a limited support of arrays and pointers. The only 
basic data types considered are integers. As output, the tool reports the upper 
and lower bounds of probabilities that the target assertion is satisfied or violated. 
The prototype tool is written in OCAML. As the abstract analysis domain D for 
encoding program properties, we use the polyhedra numeric abstract domain [10]. 
All abstract operators and sound transfer functions for the polyhedra domain 
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are provided by the APRON library [21,26]. The tool performs one forward 
reachability analysis and two backward necessary/sufficient condition analyses 
(one for satisfaction and one for violation of the assertion). The tool calls a 
model counter, LATTE [1], to determine the number of solutions to discovered 
preconditions for satisfaction or violation of the assertion. Note that if an input 
state satisfies the discovered precondition for satisfaction (resp., violation) of 
the assertion, then all program executions branching from that state will satisfy 
(resp., violate) the given assertion. The analysis proceeds by structural induction 
on the program syntax, iterating while-s until a fixed point is reached. They 
compute the unique solution which to every program point assigns an element 
from the abstract domain 


Experimental setup and benchmarks. All experiments are executed on a 64- 
bit Intel®Core?™™ i5 CPU, Lubuntu VM, with 8 GB memory. The reported 
times represent the average runtime of five independent executions. We report 
TIMEan to perform all static analyses tasks (one forward plus two backward static 
analyses), TIMEpr to compute the needed probability bounds (call to LATTE 
plus additional calculations), and TIME to complete the overall probabilistic 
analysis task. The implementation, benchmarks, and all results obtained are 
available from: https://aleksdimovski.github.io/probab-analysis.html (or, https: 
//github.com/aleksdimovski/probab_analyzer). 

For our experiment, we use a dozen of C programs taken from several 
folders (categories) of the 8th International Competition on Software Verification 
(SV-COMP 2019) °, as well as from the abstract interpretation community 
[26,30]. The folders from SV-COMP 2019 we consider are: loops, loop-lit, 
termination-crafted (which is denoted ter-crafted for short), as well as 
termination-restricted-15 (which is denoted ter-restricted for short). We 
have selected some numeric programs with integers that our tool can handle. 
We have manually added input sections, and in some of the programs we have 
also defined target assertions. Then, we have analyzed those programs using 
our prototype static analyzer. Table 1 summarizes relevant characteristics for 
each benchmark: the folder (source) where it is taken from, the number of lines 
of code (LOC), and the number of integer variables (var). There are two 
classes of benchmarks in Table 1 separated by a double horizontal line. The 
first (upper) class of benchmarks consists of deterministic programs for which 
backward necessary condition analysis is performed, while the second (lower) 
class of benchmarks are non-deterministic programs for which backward sufficient 
condition analysis is performed. 


Performances Table 1 shows the performance of our technique on a set of small 
and compelling examples (addresses Objective (O1)). We can note that for most 
of our deterministic benchmarks, the technique gives exact results without any 
approximation (which are marked with v in the EXACT column of Table 1). 
This means that the lower and upper bounds for success and failure probabilities 


6 https: //sv-comp.sosy-lab.org/2019/ 
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coincide. This is due to the fact that we use the expressive and very precise 
polyhedra abstract domain (addresses Objective (O2)). For the remaining cases, 
the technique gives approximate results (which are marked with ~ in the EXACT 
column of Table 1), since the abstraction was too coarse to calculate exact results. 
We can also see that the time for static analysis TIME,, dominates in the overall 
probabilistic analysis time TIME, whereas the probability computation time 
TIME,, is a smaller fraction of the total time. The small probability computation 
times indicate that preconditions obtained from our analyses are relatively simple, 
and so LATTE can handle them very efficiently. We have also experimented with 
different domain sizes n of input variables (for n = 10 and n = 1000). Thus, n 
denotes the number of possible values per input variable. We observe that we 
obtain similar time performance results for n = 10 and n = 1000, which means 
that the performance is not affected by the fact that inputs come from a bigger 
pool of possible values. This is mostly due to the fact that LATTE and APRON 
are largely insensitive to those values in terms of time (addresses Objective (O3)). 
In general, the obtained probability bounds provide non-trivial information about 
the behaviour of these programs and are quite hard to estimate by hand even if 
the programs in question are small. 


Application scenarios. Consider the following program (called Waldkirch.c from 
termination-crafted folder of SV-COMP 2019): 


We want to prove this assertion, since, for example, later on in the program there 
are references to an array using the index x+1 (e.g. a[x+1] :=0). In this way, we 
want to verify that there are no array-out-of-bounds references. The tool will find 
that the necessary precondition for assertion satisfaction is: —1 < x < 4, thus 
computing the success probability of 60%. The found necessary precondition for 
assertion violation is: —5 < x < —2, so the failure probability is 40% (addresses 
Objective (04)). 


Approximate results We now give an example where we obtain a precision loss in 
practice due to the approximation inherent in abstract analyses. Consider the 
following program (taken from [30]): 


@: int x:=[0,9], y:=[0,9]; linput : 
@): int s:=x-y; 

©: if (s > 2) y:=y+2; 

leinal | assert (y > 3); 


The forward analysis will infer that the program can both satisfy and violate the 
assertion. The backward necessary condition analysis for assertion satisfaction 
will discover the constraint: x + 2y > 8 A0<x<9A2<y <9, thus we 
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Table 1: Experimental evaluation for probabilistic static analyses of C programs. 
This table contains the following columns: (1) benchmark - the name of the 
analyzed program; (2) source - the source (folder) where the benchmark is taken 
from; (3) LOC - the number of lines of code; (4) #var - the number of integer 
variables; (5) TIMEL® - the static analysis time in seconds for input domains of size 
10; (6) TIME}? - the probability computation time in seconds for input domains 
of size 10; (7-8) TIME}? and TiME!°° - the overall times in seconds required 
to completely analyze a benchmark which has input domain of size 10 and size 
1000, respectively; (9) Exact - the preciseness of the reported result (v - result is 
exact, © - result is approximate). Benchmarks above the double horizontal line 
are deterministic programs, while those below are non-deterministic programs. 


Bench. source |LOG #varyTime29, |Time;? |Tiet?| Time? [Exact 
count _up_down*.c loops 20 | 3 | 0.043 | 0.001 | 0.004) 0.049 v 
hhk2008. c loop-lit 20| 4 |0.103 | 0.001 | 0.104 0.113 | v 
gsv2008.c loop-lit 20) 2 | 0.027 | 0.001 | 0.028) 0.030 | v 
Log.c ter-restricted| 30 | 4 | 0.194] 0.001 /0.195) 0.197 x 
Mono3-1.c loops-crafted-1| 15 | 2 | 0.044 | 0.001 | 0.045 | 0.046 x 
Waldkirch. c ter-crafted | 20| 1 [0.010 | 0.001 | 0.011 | 0.012 v 
bwd-loopla.c 26 15 | 1 | 0.008 | 0.001 /0.009| 0.010 | v 
bwd-loop2.c 26 15 | 2 | 0.020 | 0.002 | 0.022| 0.022 | v 
examplela.c 30 10 | 1 | 0.008 | 0.001 /0.009| 0.008 | v 
example7a.c 30 15 | 2 | 0.023 | 0.001 |0.024| 0.026 | v 
lf or-bounded*.c loops 30| 4 | 0.049 | 0.002 | 0.051 | 0.053 | =% 
bwd-loop7.c 26 15 | 2 | 0.027 | 0.001 | 0.029} 0.030 | = 
bwd-loop10.c 26 20) 2 | 0.046 | 0.001 | 0.047 | 0.048 | & 
example7b.c 30 15 | 2 | 0.039 | 0.001 | 0.040} 0.048 | = 


find that the upper bound probability for assertion satisfaction is 74%. The 
backward necessary condition analysis for assertion violation will discover the 
constraint: x+ 5y <23 AO0<x<9AO<y<<83, thus we find that the upper 
bound probability for assertion violation is 32%. In this way, we conclude that 
the success probability is between 68% and 74%, while the failure probability is 
between 26% and 32%. On the other hand, we can calculate by hand that the 
success probability is exactly 71%, while the failure probability is exactly 29%. 


7 Related work 


Probabilistic analysis of imperative programs based on symbolic execution has 
been introduced before [18,17,3,29]. They calculate path probabilities by counting 
the number of solutions to a path condition, which represents a constraint on 
inputs. The analyses in [18,17] address programs with integer domains and 
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linear constraints, whereas the analyses in [3] address programs with linear and 
complex floating-point computations. While the previous analyses are restricted 
to discrete, uniform random variables that take on a finite set of values, the 
probabilistic analysis in [29] can also handle non-uniform distributions over the 
reals and integers using a branch-and-bound technique over polyhedra. However, 
in presence of loops all above analyses based on symbolic execution lose precision, 
since they cannot enumerate all program paths. The solution is to consider 
bounded exploration of loops and only a finite number of feasible program paths. 
Thus, they also define a measure of confidence on the obtained probabilistic 
estimations in order to take into account the contribution from the unexplored 
feasible paths. For example, if we set the exploration bound of the loop of program 
P, in Fig. 2 to any number less than 100, both success and failure probabilities 
will be 0% and the confidence will be also 0. This is due to the fact that the 
while loop in Fig. 2 has to be unrolled at least 100 times in order to obtain a 
feasible path on which it can be decided whether the assertion at point lginai is 
satisfied or violated. In this work, instead of symbolic execution we use abstract 
interpretation to analyze programs and infer preconditions for success and failure. 
Thus, our approach for computing program reliability represents one of the 
pioneering works that provides a complete and fast treatment of while loops. In 
particular, the strength of our approach is being an abstract interpretation of 
a complete semantics for computing program reliability. This is stronger than 
fixing a priori an incomplete reasoning approach that can miss some feasible 
program paths (executions). The work [13] performs a probabilistic analysis of 
open programs using symbolic game semantics [12] and model counting. It uses 
game semantics to model open programs with undefined identifiers (e.g. calls to 
library functions), such that the model takes into account all possible contexts 
in which those programs can be placed. In the presence of loops and undefined 
functions, bounded exploration in the model is also used to obtain a feasible 
analysis. Probabilistic model checking [2] is yet another approach to perform 
probabilistic analysis on a high-level design of software. However, such high-level 
models are difficult to maintain and may abstract important details that impact 
the chance of property satisfaction. So the goal is to do probabilistic analysis 
directly on source code as here, not on high-level models. 


Backward precondition analyses by abstract interpretation have also been 
used in practice for a long time [4,9,26,28]. Sufficient preconditions have been 
first introduced by Bourdoncle [4] in his work on abstract debugging of deter- 
ministic programs. He uses a combination of forward-backward analyses to find 
preconditions for invariant and intermittent assertions to always hold. Cousot et. 
al. [9] propose a method for automatically inferring contract preconditions for in- 
termittent assertions. The preconditions extracted by their method are necessary 
preconditions, i.e. they do not exclude unsafe executions. Mine [26] presents a 
method for automatically inferring sufficient preconditions of non-deterministic 
programs by using a polyhedral backward analysis. The under-approximating 
sound abstract operators for this backward analysis are implemented as part of 
the APRON library. Rival [28] uses forward-backward analysis to inspect more 


Computing Program Reliability 199 


closely reported alarms by ASTREE, which are then classified as true errors (bugs) 
or false alarms. Urban and Mine [30] use forward-backward analysis for the auto- 
matic inference of sufficient preconditions for program termination. The elements 
of the analysis domain are decision trees, where decision nodes are labeled with 
linear numerical constraints and leaf nodes are affine ranking functions for proving 
program termination. Forward-backward analysis schemes have been used in [20] 
for the inference of safety properties of declarative synchronous programs. In this 
work, for the first time we employ forward-backward precondition analysis for 
estimating program reliability. 

Static analysis of probabilistic programs by abstract interpretation has also 
been a topic of research [27,11]. Monniaux [27] proposes a probabilistic analysis 
that annotates abstract domains with upper bounds on the probability measure 
associated with abstract objects. However, the measure bound is associated 
with the entire abstract object, without tracking how it is distributed amongst 
the individual states present in the concretization. This restriction makes the 
analysis quite conservative. Cousot and Monerau [11] provide a general framework 
that encompasses a variety of probabilistic interpretation schemes. However, no 
concrete implementation of the above probabilistic abstract interpretations is 
provided yet. A backward abstract interpretation for probabilistic programs 
[23] uses expectations that are real-valued functions of the program state and 
quantitative loop invariants. The automatic inference of such quantitative loop 
invariants was proposed in the recent work of Katoen et al [22]. 


8 Conclusion 


We have presented a new static, abstract interpretation-based approach for com- 
puting program reliability, which allows to calculate upper and lower bounds of 
probabilities that a given assertion is satisfied or violated. We construct a combi- 
nation of forward-backward abstract analyses, in order to find an approximation 
of a set of input states which lead to definite satisfaction (resp., violation) of 
the given assertion. Our approach to calculating program reliability is semantics- 
based and approximate in a provably sound way. Still, it often yields very precise 
results, especially for deterministic programs. 

We currently support only uniform distribution of input values within their 
finite discrete domains. In future, we plan to model imprecision in the input 
by different non-uniform distributions, such as Binomial, Poisson, etc [29]. The 
current implementation of LATTE is limited in handling non-uniform distributions, 
so we will explore the use of statistical sampling techniques in those cases. Our 
focus here is on estimating probability for safety properties. We also plan to 
consider liveness properties (such as termination) and expectation queries [30]. An 
interesting direction for future work would also be to consider general probabilistic 
programs [19], as well as program families implemented with #ifdef-s from the 
C-preprocessor where we can use lifted static analyses to efficiently analyze all 
variants of the family simultaneously at once [14,24,15,16]. 
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Abstract. Cyber-physical systems are ubiquitous nowadays. However, 
as automation increases, modeling and verifying them becomes increas- 
ingly difficult due to the inherently complex physical environment. Skill 
graphs are a means to model complex cyber-physical systems (e.g., vehi- 
cle automation systems) by distributing complex behaviors among skills 
with interfaces between them. We identified that skill graphs have a high 
potential to be amenable to scalable verification approaches in the early 
software development process. In this work, we suggest combining skill 
graphs with hybrid programs. Hybrid programs constitute a program no- 
tation for hybrid systems enabling the verification of cyber-physical sys- 
tems. We provide the first formalization of skill graphs including a no- 
tion of compositionality and propose SKEDITOR, an integrated frame- 
work for modeling and verifying them. SKEDITOR is coupled with the 
theorem prover KEYMAERA X, which is specialized in the verification 
of hybrid programs. In an experiment exhibiting the follow mode of a ve- 
hicle, we evaluate our skill-based methodology with respect to savings in 
verification effort and potential to find modeling defects at design time. 
Compared to non-compositional verification, the initial verification effort 
needed is reduced by more than 53%. 


Keywords: Deductive verification, design by contract, formal methods, 
theorem proving, KEYMAERA X, hybrid systems, automated reasoning, 
cyber-physical systems 


1 Introduction 


Cyber-physical systems combine digital computations and physical processes by 
tightly integrating discrete and continuous dynamics [6]. The last decade has wit- 
nessed an increase in the degree of automation in safety-critical cyber-physical 
systems (e.g., such as self-driving cars and transportation in general). Further- 
more, the complexity of formally modeling and verifying such systems (e.g., by 
means of hybrid systems models |11, 19,30]) to reason about safety increased 
simultaneously. Although there is a clear desire for an early identification and 
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Fig. 1: Excerpt of a skill graph representing an operation to keep distance to a 
leading vehicle. We illustrate informal safety guarantees for the three skills s1, 
Sg, and So. 


elimination of severe mistakes [9], there is still a remarkable lack of formal meth- 
ods integrated in the software development cycle [16,17]. The challenge is to 
derive modeling and verification approaches that are applicable in the early de- 
velopment stages (e.g., requirements analysis and design time). To address this 
challenge, we present a model-based verification framework unifying the decom- 
position and modeling of cyber-physical systems by means of skill graphs [33] 
and a formal verifcation of these models by means of hybrid systems [2,3, 5,25]. 

A skill is a simple capability (e.g., acceleration in the context of a vehicle) 
explicitly provided by a cyber-physical system. Skills exhibit specific behaviors 
(i.e., control algorithms) by a mapping to some implementation unit (e.g., source 
code or interacting software components). Skills are assigned to a specific cate- 
gory (e.g., actuator, sensor, or observable behavior) with a defined hierarchy to 
prevent modeling mistakes. This categorization follows the design principle of 
separation of concerns [27], which ensures that skills only have single well-defined 
responsibilities. Separation of concerns is known to have a positive effect on mod- 
eling complexity, comprehensibility, functional reusability, fault localization, and 
artifact traceability [15,36]. Skills can be annotated with safety guarantees ob- 
tained from a preceding requirements analysis, which enables the application of 
verification techniques. 

Skill graphs, informally introduced by Reschka et al. [32,33], are a promising 
means to model complex actions of cyber-physical systems from an architectural 
point of view. A skill graph [26, 32,33,37] is a directed acyclic graph comprising 
a set of skills (i.e., nodes) and dependencies between them (i.e., edges). To 
describe the properties we want to verify in a skill graph, we illustrate a skill 
graph representing a driving task in Figure 1. The task exhibits that a vehicle 
autonomously tries to keep a distance of at least 10 m to a leading vehicle. On the 
top level, skill Keep distance to leading vehicle (sı) depends on two other 
skills, namely (1) the planning skill Select target object (s2) and (2) the action 


Skill-Based Verification of Cyber-Physical Systems 205 


skill Control the longitudinal dynamics (sg). Whereas sensor-dependent 
skills are typically realized by software algorithms only (e.g., deep learning for 
detecting an obstacle), actuator-dependent skills (highlighted with a dashed 
border) also need to incorporate control theory, as the physical environment has 
to be taken into account. Skills are annotated with safety requirements (e.g., 
maximum acceleration or minimum distance to other vehicles). Together with the 
skill’s realization and its dependencies to other skills, this requirement expresses 
the property we want to verify at design time. Successfully verifying all skills in 
the context of a skill graph ensures that the represented task complies to the 
complete set of safety requirements. 


Conceptually, skill graphs as applied in this work are used for designing 
and organizing the architecture of a cyber-physical system. First, they facili- 
tate the modeling of complex maneuvers built from simpler skills, which inter- 
act through explicit interfaces. Second, they advocate the systematic reuse of 
ready-to-integrate skills for multiple skill graphs, which reduces maintenance 
costs and increases software quality in general. Third, skill graphs are intuitive 
and therefore accommodate good potential for communicating with stakehold- 
ers and non-experts. Typically, skill graphs are supplied with performance mea- 
surements with the goal to enforce safety requirements at run-time. We are the 
first to exploit skill graphs to formally reason about the satisfiability of safety 
requirements at design time. Both areas of application complement each other, 
as they cover the full range from static analysis in the design phase to run-time 
verification and monitoring during operation. 


As the foundation for our model-based verification approach, we propose to 
realize skills that interact with the physical environment by means of hybrid 
systems based on the differential dynamic logic d£ [28,29,31]. Hybrid systems 
represent complex physical systems, typically modeled as automata, where states 
are defined by continuous variables based on differential equations and transitions 
between states are discrete. Differential dynamic logic enables the deductive 
verification of hybrid systems, and as such is suitable for reasoning automatically 
about the correctness of hybrid systems. The key step of our approach is to 
decompose complex tasks of a cyber-physical system into skills connected by 
means of a skill graph and to provide a translation of skills to hybrid systems. 
The combination of skill graphs and hybrid systems allows the identification of 
severe mistakes during early design phases and also — in case of success — to 
generate correctness proofs, which increases trust that the system under design 
behaves as intended. Moreover, we propose a notion of compositionality for skill 
graphs, which is crucial to manage scalability during the verification phase. While 
skill graphs may only model simple functional aspects, they can be assembled to 
exhibit more complex behaviors, and verification results of skills can be reused. 


We have implemented a prototype for modeling and verifying skill graphs 
called SKEDITOR. SKEDITOR supports the graphical modeling of skill graphs, 
allows to specify safety guarantees, and enables formal verification through a 
mapping to hybrid programs [30] (i.e., a program notation for hybrid systems as 
required by the theorem prover KEYMAERA X [14]). In a case study exhibiting 
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Fig. 2: Simplified hybrid system of a vehicle with automatic headway control. 


the follow mode of an automated vehicle, we evaluate SKEDITOR with respect 
to its potential to find modeling defects. In particular, SKEDITOR allowed us 
to find conceptual defects of control algorithms early on in the design phase of 
our case study. To summarize, the contribution of this work is threefold. 


— Framework: We are the first to formalize skill graphs and propose skill- 
based verification, a model-based verification technique allowing us to identify 
poorly defined safety requirements in early design phases by combining skill 
graphs with hybrid programs. 

— Tool support: We implemented skill-based verification in a prototypical 
open-source tool called SKEDITOR, which paves the way for users to model 
and verify cyber-physical systems based on skill graphs. 

— Evaluation: We demonstrate our approach on a realistic case study exam- 
ining the follow mode of an automated vehicle. We show that skill-based ver- 
ification decreases effort compared to monolithic modeling. 


2 Background on Hybrid-System Modeling 


A prominent mathematical foundation for cyber-physical systems is constituted 
by hybrid systems |2, 11,19,30], which enable a mixed modeling of continuous 
dynamics (expressed by differential equations) and discrete dynamics (expressed 
by automata). The states change on the basis of flow conditions. 


Example 1 Consider the example of an automatic headway control of a vehi- 
cle depicted in Figure 2. Four variables exist: the host’s current position (x), the 
current position of the leading vehicle (xı), the current velocity (v), and the cur- 
rent acceleration (a). The headway control exhibits three states: (s1) the vehicle 
is in cruise mode when the current distance to the leading vehicle is equal to a 
defined constant D, (S2) the vehicle accelerates when the distance is greater than 
D, and (s3) the vehicle decelerates when the distance is less than D, but only 
until the vehicle comes to a full stop. The headway control ensures that the dis- 
tance to the leading vehicle is approximately equal to D. 
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Hybrid programs define an imperative-like program notation for hybrid sys- 
tems [28], which support the definition of variables that evolve along a differen- 
tial equation and are interpreted by tools such as KEYMAERA X [14]. The 
syntax of hybrid programs is as follows. 


a:=a;8|aUB|a*|c:=Ola:=*|c =O&H|?H (1) 
a; B represents the sequential composition of two hybrid programs. aU expresses 
the non-deterministic choice between two hybrid programs. a* expresses that the 
execution of œ may be repeated zero or more times. The discrete assignment to 
x is either a term O (possibly over x) or an arbitrary value represented by the 
wildcard *. The continuous evolution of a variable x along a differential equation 
is described by x’ = O & H, where H is an optional evolution domain. Finally, 
?H describes a testable condition that aborts the evolution if H is false. For 
instance, the program a=(v := *;a:= *; ?(—ap < a < 0); {v = a &v > O}) sets 
velocity v to an arbitrary value and acceleration a to a value between —ay (i.e., 
maximum braking force) and zero. The execution stops nondeterministically at 
any time but at the latest before velocity v reaches a negative value. 

Semantics of hybrid programs are based on differential dynamic logic d£ [28, 
29,31] to specify and verify properties of hybrid programs associated with a 
skill in a skill graph. Models specified in d£ can be verified with KEYMAERA 
X, a matured open-source theorem prover for hybrid programs. The following 
grammar describes all valid formulas of d£. Symbol ~ is a placeholder for a 
comparison operator (i.e., ~€ {<, <, >, >, =, #}) between two terms 0; and 
O2. Terms are polynomials with rational coefficients over the set of continuous 
variables. 


P ::= 0, ~ O2|-€8|PAV|EVV|S>V|\Vx@|ArG| [alo (2) 


The semantics of the logical connectives is defined as in first-order logic. Addi- 
tionally, the modal formula [a]® holds if all runs of the hybrid program a end 
in a state that satisfies the given condition ®. Following the idea of Hoare-style 
specification in classical deductive reasoning [1,7, 10, 18,34], we are particularly 
interested to prove validity of the condition ¥ > [a]® with W expressing assump- 
tions we have and @ expressing guarantees to meet by the hybrid program a. 


3 A Formalization for Skill-Based Modeling 


In this section, we propose the first formalization of modeling cyber-physical 
systems based on skill graphs. First, we define the essence of a skill. Second, 
we continue with the definition of a skill graph and what makes it well-formed. 
Third, we define how to compose skill graphs to exhibit more complex behaviors. 


3.1 Formalizing Skills 


In the context of cyber-physical systems, skills describe fine-grained executable 
activities inspired by human behaviors [32,33]. For instance, a skill may repre- 
sent longitudinal driving (i.e., driving with constant velocity) or even a more 
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complex combination of longitudinal and lateral maneuvers (i.e., following the 
lane). To ensure that such maneuvers are executed safely, skills are associated 
with so-called safety guarantees, which they must fulfill to be considered safe. 
For example, a skill exhibiting the following of a leading vehicle should keep a 
minimum distance of a specified constant D (cf. Fig. 2). Informal safety guaran- 
tees are typically formulated by experts who identify numerous hazardous sce- 
narios with respect to a maneuver and resolutions to prevent them. 

The implementation of skills was only vaguely specified before. Typically, 
skills are implemented by software components [33]. However, our goal of early 
verification at design time requires to also consider a model of the physical en- 
vironment. Therefore, we propose to implement skills by hybrid programs [28], 
which already incorporate assumptions about the physical environment and en- 
able the verification of implementation against safety guarantees at design time. 

To separate concerns, a skill has an associated type. We define the set Type 
= {observable behavior, action, perception, planning, sensor, actuator}, 
which categorizes the purpose of a skill. Moreover, a skill has dependency- 
relationships with other skills. Informally, the idea is that a hybrid program of 
a skill may introduce a set of continuous state variables, their computation, and 
their valid domains (e.g., velocity v € [0,60] with v’ = a), but may also require 
the presence of variables and their domains defined by other skills (e.g., accelera- 
tion a € [0,4]). In the following, we formally define a skill. Let ¥ denote the uni- 
verse of continuous variables. The syntactic domain of a skill is defined as follows. 


Definition 1 (Skill). A skill is a 5-tuple (Xact, Xreq, Q, T, ®), where 
© Xact C X is a finite set of variables defined in the hybrid program a, 
© Xreqg C X is a finite set of variables required by the hybrid program a, 
e a is the (possibly empty) hybrid program (cf. Eq. 1) over variables in Xaet U 


A req; 
e 7 € Type is the associated type, 
e b= {1,...,¢m} is a finite set of safety guarantees in first-order logic over 


variables in Xaet U Xreq (cf. Eq. 2). 
To be well-formed, we require that the sets of defined and required variables of 
a skill are disjoint (i.e., Xaet N Xreq = 9). To access a skill’s attribute, we use 
the ’.? (dot) operator (e.g., s.7 expresses the type of skill s). 


3.2 Formalizing Skill Graphs 


We formalize skill graphs as directed acyclic graphs comprising a set of skills 
(i.e., nodes), which are connected through directed edges representing their de- 
pendencies. We denote by S the universe of all skills and define the syntactic 
domain of skill graphs as follows. 


Definition 2 (Skill Graph). A skill graph is given by G = (S,r, E), where 
e SCS is a finite set of skills, 
e r € S is the root skill, 
e EC SxS is set of directed edges between skills. We denote (sc, Sp) € E as 
Sc < Sp meaning that se is a child of sp. 
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Ts \ Tt observable action actuator planning perception sensor 
observable vg vg - [V ~ 7 
action - vg vg vg vg - 
planning - - m v4 s a 
perception a > = = vg ~ 


Table 1: Valid types of a child skill ¢ for a skill s (i.e., t < s). 


A skill graph is an acyclic directed graph with exactly one root skill r. To 
guarantee that skill graphs are well-formed, we impose specific constraints. We 
formally introduce the path between two skills as follows. 


Definition 3 (Path). Let E be a set of edges and s1,..., 8; E S skills of a skill 
graph. A path of length l— 1 is a (possibly empty) sequence of l—1 edges (81, s2), 
(82,83), .-., (S1-1, 81) E E denoted by Tsis, = [(S1, 82), ($2, 83),---, (S1-1, 81)/. 
We say that a path between skills s,s’ € S exists if 7545: is non-empty, and does 
not exists otherwise. 


As mentioned before, each skill has an assigned type. Based on our definition of 
a well-formed graph, we enforce that only skills with particular types can form 
valid parent-child relationships (cf. Table 1). For instance, for two skills s,s’ € S, 
if s < s’ holds and skill s’ is of type perception, then skill s is only allowed to 
have type sensor or perception. 


Definition 4 (Well-Formed Skill Graph). Let G = (S,r, E) be a skill graph. 
G is well-formed if and only if 
e each skill s € S\ {r} in a skill graph has at least one parent skill s' € S 
(i.e., {5 E€ S|s < s'} #0) and there exists at least one path from skill s to 
root skill r, 
e for each edge (s,s') € E, skills s,s’ satisfy the typing restriction depicted in 
Table 1, 
e for each skills € S and variable x € s8.Xzyeq there exists a path nss from 
a skill s € S that introduces variable x (i.e., x € s'.Xaet), 
e for each pair of skills s,s' € S, the sets of defined variables are disjoint (i.e., 
8.Xaee 15’. Xaee = Í). 


e for each skill s in G, formula Noes o must be satisfiable. 


1 NSS 


Remark. Unlike behavioral models, skill graphs as defined here do not suggest an 
execution order of skills on the same level (i.e., child skills). The reason is twofold. 
First, the information needed for the scheduling may be incomplete at design 
time (i.e., concrete hardware and scheduling parameters). Second, the intent of 
skill graphs is to abstract away from implementation details, while providing 
guarantees about the correctness of defined safety requirements. In Section 4.2, 
we illustrate how to assemble the decomposed hybrid programs of a skill graph 
to a complete hybrid program, while being safe with respect to our chosen level 
of abstraction. 
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Fig. 3: Example of a composition of two skill graphs. 
3.3 Composition of Skill Graphs 


From the perspective of software engineering practices, an advantage of skill 
graphs is their modular nature. Multiple skill graphs can be designed in isolation, 
but may also share the same skills. To model and verify more complex skill graphs 
and to prevent unnecessary redundancy, the idea is to adequately reuse previously 
designed skill graphs and subsequently compose them together. This method 
further supports the identification and location of design mistakes, maintenance 
of skill graphs in general, and also enables the distribution of modeling tasks in 
multi-team software development. 

Our composition technique of skill graphs is inspired by superimposition [8], 
a simple process that composes two graphs recursively together by merging their 
substructures. Starting from the root skill of one of the skill graphs, skills at 
the same level fulfilling defined criteria can then be composed to form a new 
resulting skill. Starting from a joint root skill of two different skill graphs G1 = 
(S1,7, E1) and G2 = (S2,r, E2), two skills sı € Sı and s2 E€ S2 are composed to 
a new skill s if: 

— both paths, Tss, and 7,,-+s,, exist and s}, s3 are already composed, 

— sı and s2 have an equal type and equal sets of defined and required variables, 
— and either any of the two hybrid programs is empty or both are identical. 
For illustration, Fig. 3 depicts an abstraction of the composition of two skill 
graphs. Both skill graphs share the identical skills A and C. First, the root skill 
A of both skill graphs is superimposed, and second, skill C is superimposed after 
identifying that in both skill graphs there exists a path to a skill already subject to 
composition (i.e., A). In the following, we call two skills from different skill graphs 
composable if they are subject to the composition as explained here. The resulting 
skill s receives all the properties (i.e., variables, type, and hybrid program) from 

the composable skills and additionally the union of their safety guarantees: 


Definition 5 (Composition of Skills). Let sı € Sı and sg € S2 be two 
composable skills. The binary composition of sı and s2 then produces the skill 


8, ® S2 = (s1.Xaet, S1.-Xreq, $1-Q, 81.T, $1-PU 82.) . (3) 


The binary composition of two skill graphs is then formally defined as follows, 
where M = {(s1,52) € S1 x S2|sı and sz are composable} is the set of com- 
posable skills and f is a function that maps every skill in (S1 U S2) \ {s1, s2 € 
Sı U S2 | (s1,52) € M} to itself and maps all skills s1, s2 with (s1, s2) E€ M toa 
new skill s = sı @ So. 
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Definition 6 (Composition of Skill Graphs). Let Gi = (51,11, E1) and G2 
= (52,72, E2) be two well-formed skill graphs with rə € Sı. The composition of 
G, and Gə then produces the skill graph 


Gi o G2 = (S, f(r), E (4) 


where 
= S = {f (s)| s € 91 U So}, 
— for every s,s’ € (Sı U S2), there exists an edge (f(s), f(s’)) € E if and only 
if there exists an edge (s, s") € (E1 U Ea). 


A mathematical convenience of our definition of composition is that it requires 
the root skill of one skill graph to be present in the second skill graph. This is 
not a severe limitation, as it is always possible to add an artificial root to one 
skill graph (or both) with respect to well-formedness. 


4 Compositional Verification of Skill Graphs 


In this section, we formalize the generation of verification conditions to check 
correctness of skills in the context of a skill graph, show how correctness results 
transfer to the composition of skill graphs, and discuss how this methodology 
can be integrated into the development process for cyber-physical systems. 


4.1 Verification Condition Generation 


Our verification procedure relies on assume-guarantee reasoning. Thus, to verify 
whether a skill s in the context of a skill graph adheres to its safety guarantees 
s.®, we have to construct two logical conditions: (1) necessary assumptions on 
a skill’s behavior denoted by assume, and (2) the overall safety condition in the 
context of the skill graph denoted by safe,. For instance, assume, for leaf skills 
valuates trivially to true, but child skills impose constraints on their parent skills 
through their safety guarantees. Both conditions can be computed automatically 
based on the skill’s dependencies and by the manually defined safety guarantees 
s.®. The overall verification condition then becomes assume, —> [s.a]safe, (cf. 
Sec. 2). In the following, we describe how both conditions are constructed. 

In the context of a skill graph, a particularity to deal with is that a skill 
may require variables introduced in a distant skill (i.e., path length greater than 
one), possibly with numerous updates along the path. These variables may be 
unknown in direct children, so it is not possible to only define the assumption (i.e., 
assume,) of a skill s as the conjunction of the safety guarantees of all children 
(i.e., A safes with s’ < s). In Fig. 4, we illustrate this problem and its solution 
on a simple skill graph comprising three skills. 

Skill #1 introduces variables A and B including safety guarantees on them 
in ¢1. Typical for assume-guarantee reasoning, ¢; becomes the assumption for 
all parent skills (i.e., Skill #2 in this case). However, the safety guarantee of 
Skill #2 (i.e., é2) states only a modification of variable A and not B, but Skill 
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Fig. 4: Computation of assume, and safe;. 


#3 may indeed need the information of the current domain of variable B to be 
verifiable. To keep assume-guarantee propagation intact, we resolve this issue by 
additionally encoding all safety guarantees that remain valid for a skill in its 
safety guarantee safe, (highlighted in blue). In the following, we introduce our 
formalization. 

The definitions of both formulas, assume, and safe,, are mutually recursive. 
The logical formula assume, for a skill s results from the conjunction of the 
overall safety guarantees safe, of all children s’ < s. The assumption for skills 
with no children valuates trivially to true. 


assume, = VAN safes (5) 
s'xs 

To compute the overall safety guarantee safes, we exploit that assume, exhibits 
an overapproximation on the current state of the required variables for a skill s 
prior to executing the hybrid program s.a. As the behavior of a skill may change 
the initial state, we discard all clauses in assume, sharing a variable with one of 
the user provided safety guarantees in s.. The remaining clauses become part 
of safe,. For instance, in Fig. 4, Skill #3 guarantees a change of variable B in 
¢ 1. Thus, only clauses of assume3 without mentioning B transfer to safes. For 
mathematical convenience, we denote the conjunction of all safety guarantees 
of a skill by the logical formula ¢, = A ges.¢ Ê and the set of assumptions of a 
skill in a skill graph by the set A, = {41,..., Yn |assume, = U1 Ac A Wn}. 
Furthermore, set var(-) denotes the set of variables of a logical formula. The 
overall safety guarantee of a skill is then computed as follows. 


safe, = @s A^ ( \ p) (6) 
WeEAsA 
var(ds)Nvar(w)=0 
We can now define the validity of a skill graph as follows. 


Definition 7 (Valid Skill Graph). Let G = (S,r, E) be a well-formed skill 
graph. We say that skill graph G is valid if and only if Vs E€ S formula assume, 
is satisfiable and formula assume, > [s.a]safe, is valid. We denote by s Eg s.& 
the validity of a skill s in a skill graph G with respect to its safety guarantees 
and by = G the validity of the entire skill graph (i.e., = G = Vs E S, s Ha s.@). 


The upcoming important theorem states that the individual validity of two 
skill graphs also transfers to the validity of their composition. However, based on 
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Def. 6, composition may also lead to an invalid skill graph if the assumption of 
a skill in the new skill graph is not satisfiable (e.g., possible in case of diamond 
structures). Therefore, we require satisfiability checks for the computed assump- 
tions and define the compatibility between two skill graphs as follows. 


Definition 8 (Compatible Skill Graphs). Let Gı and G2 be two well-formed 
skill graphs. We say that G, and G2 are compatible if the following holds. 


— G10 Go is a well-formed skill graph, 
— for each skill s in G1 o G2, formula assume, is satisfiable. 


Theorem 1 (Composition of Skill Graphs Retains Validity). Let Gı and 
Gə be two compatible skill graphs and G = G: o G2 their composition. Then, G 
is valid if G, and Go are valid (i.e., = G if || Gi and H Go). 


Proof. Let sı and s2 be two composed skills and s = sı © s2 their composition. 
Following Def. 6, the verification condition for s becomes 


(assume,, ^ assumes, ) — [s.a](safe,, ^ safes, ). 


Based on the semantics of d£ [31], condition ¥ > [a]®, AW > [a]: 6 ¥ > 
[a](®; A B2) holds. As the hybrid programs of sı and s2 are identical (or at least 
one of them is empty), the resulting two conditions to check are the following: 


(1) (assume,, ^ assumes.) > [s1.a] (safes; ) 


(2) (assume,, ^ assumes, ) —> [s2.a] (safes, ) 


Satisfiability of (assumes, ^ assumes, ) follows from Def. 8. Then, validity of both 
conditions follow from Def. 7 and, consequently, H= G holds. 


4.2 Assembling Hybrid Programs in a Skill graph 


Skill graphs decompose the system into smaller parts. Likewise, the hybrid pro- 
gram that represents the complete behavior is also distributed over the skill graph. 
Now that we have defined the structure and behavior of single skills in the context 
of a skill graph, we define how we can construct the complete behavior of a skill 
as a single monolithic hybrid program. The resulting hybrid program is then a 
complete representation of the skill’s behavior while also retaining all safety guar- 
antees without the need of re-verifying skills or even entire skill graphs. We start 
by giving a definition on how hybrid programs of skills are assembled together. 


Definition 9 (Hybrid Program Assembly). Let G = (S,r, E) be a skill 
graph, HP the set of all hybrid programs, and let s E€ S denote an arbitrary 
skill of G. A hybrid program assembly of s is a function p: S + HP, which is 
recursively defined as follows. 


(s) s.a if s has no children (i.e., =35s' € S : s! < s) 
s] = 
j (U2, 0(8’)); 5. otherwise 
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The motivation is that such assemblies are safe to be used in other contexts, 
such as code generation for the validation of prototypes or monitor generation. 
Assuming a valid skill graph G, the following theorem guarantees that any hybrid 
program assembly over skills in G retains the respective safety guarantees. 


Theorem 2 (Safety Compliance of Hybrid Program Assemblies). Let 
G = (S,r, E) be a valid skill graph and let s € S denote an arbitrary skill of G. 
Then, formula [p(s)|safe, is valid. 


Proof. We proceed by induction on the skills of skill graph G. For the basis 
step, we assume that s has no children (i.e., a4s’ € S : s’ < s). Because 
[e(s)|safe, = [s.a]safe, and G is a valid skill graph, it follows from Def. 7 
that formula [p(s)]safe, is valid. From now on, we assume that s has children. 
Our induction hypothesis is that if for each skill s’ < s program assembly p(s’) 


satisfies safes, then hybrid program assembly p(s) satisfies safes: 


(TH) A p(s')]safe,,) > [p(s)]safes 

s!<s 

(1) e A p(s')]safe,,) > [ U p(s’); s.a]safes 
s/<s s'<s 

(2) o AN p(s')]safe,,) > [ U p(s’) ][s-a]safes 
alata s/<s 

(3) e A p(s')]safe,,) > IN [p(s’)][s-a]safe, 
s'<s s! <s 

(4) o A safe.) > [s.a]safes 
s’<s 

(5) +> assume, — [s.a]safe; 


Transformation step (1) follows from substituting p(s) with its definition given 
in Def. 9. Steps (2)-(4) are again based on the semantics of d£ [31]. Step (2) 
follows from the sequential composition axiom [a;b|P © [a][b]P, step (3) from 
the nondeterministic choice axiom [aU b|P + [a]P A [b]P, and step (4) from 
monotonicity. Because G is a valid skill graph, validity of assume, > [s.a]safe, 
follows again from Def. 7. Consequently, [p(s)|safe, is valid. 


4.3 Integration into the Software Development Process 


In Figure 5, we summarize the methodology for modeling and verifying skill 
graphs. The main idea is that the safety verification of skill graphs modeled in 
isolation transfers to the composition of compatible skill graphs. This (a) eases 
the modeling process, as smaller models tend to be less complex and easier to 
repair, (b) fosters reusability, which is known to be cost-effective and less error- 
prone, and (c) is promising for scaling the verification to large skill graphs. 

In particular, the methodology consists of five major parts. In the first 
part (1), practitioners define and model skills together with their hybrid pro- 
grams and relevant safety guarantees in isolation and subsequently connect them 
to form well-formed skill graphs (if possible). In the second part (2), for each 
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(1) Identify safety guarantees 


and model skill graphs Gi,...,Gn Re-modeling 


(2) For each skill s in G, (6) Identify and 
identify the pair (assumes, safe.) localize design flaws 
A 
(3) For each skill s in G, verify validity of G is invalid 
assume, —> [s.a]safe, to establish G Saver 
and Vs, = p(s)safes 
Y Gj and Gk are 
(4) To verify that Gj o Gz is valid, incompatible 
check compatibility of Gj and Gk 


y 


(5) Gj o Gk is valid if 
Gj and Gx are both valid 


Fig. 5: Methodology of modeling and verifying skill graphs. 


skill s in a skill graph, the assumption assume, and safety guarantee safe, are 
computed by evaluating the context of the skill in the skill graph. The third part 
(3) uses the identified assumptions and the safety guarantee to validate each skill 
in a skill graph individually. If each skill is proven valid (cf. Theorem 1), the com- 
plete skill graph is proven valid and can be put into a repository to be reused. 
Following Theorem 2, all program assemblies over skills in this skill graph retain 
the respective safety guarantees. The fourth part (4) becomes relevant, if two 
skill graphs are composed together to represent a more complex task of a cyber- 
physical system. In this case, compatibility of the skill graphs is checked and, if 
successful, the validity of the composed skill graph is established (5). The final 
part (6) is relevant in the presence of unsuccessful proof attempts. If validity of 
a skill graph or the composition of multiple skill graphs cannot be established, 
practitioners need to identify and fix mistakes in their models. Typically, the 
complexity of localizing design mistakes is reduced with our methodology, as it 
is explicitly known which exact skills in a skill graph with respect to their safety 
guarantees could not be verified. 


5 Evaluation and Discussion 


We evaluate our skill-based verification approach on a case study to answer the 

following two research questions. 

RQ-1 How does the skill-based methodology compare to monolithic modeling and 
verification? 

RQ-2 To what extent can skill-based compositional verification reduce the veri- 
fication effort? 


5.1 Open-Source Implementation 


We implemented skill-based verification in a tool with the name SKEDITOR. 
The implementation is written in JAVA as an Eclipse plug-in based on Graphiti [13], 
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Fig. 6: Complete skill graph expressing an automated vehicle follow mode. 


a framework for developing diagram editors in the context of model-driven devel- 
opment. The prototype allows practitioners to model and annotate well-formed 
skill graphs with safety guarantees as described in Section 3. 

Thereupon, we implemented our compositional verification approach as de- 
scribed in Section 4. SKEDITOR allows to synthesize hybrid programs of spe- 
cific skills with respect to their dependencies in the skill graph. Compliance 
checks of the provided safety guarantees are performed by employing the deduc- 
tive theorem prover KEYMAERA X [14] in version 4.7.3. SKEDITOR and all 
experimental results can be found online.’ We use the SKEDITOR to answer 
research questions RQ-1 and RQ-2. 


5.2 Case Study: Vehicle Follow Mode 


To illustrate the practicality of our approach, we model and verify the vehicle fol- 
low mode of an automated protective vehicle as adopted from Nolte et al. [26] and 
depicted in Figure 6. The aim of was to develop an unmanned protective vehicle 
which is able to drive on the hard shoulder autonomously (i.e., without any hu- 
man interaction). On the lowest level, the skill graph consists of three sensors (i.e., 
Radar, Camera, and Inertial sensor) to perceive information from the envi- 
ronment. Additionally, three actuators (i.e., Brake system, Powertrain, and 
Steering system) represent concrete technical aspects. These skills propagate in- 
formation about typical properties of a concrete model of a vehicle (e.g., the maz- 
imal deceleration). As highlighted with two shades of gray, this skill graph is divid- 
able into two separate skill graphs, which we refer to as G1 and G2. Gi has Keep 
distance to leading vehicle as the root skill, which is responsible for ensuring 
a minimum distance to a leading vehicle. Gz has Follow hard shoulder as root 
skill, which is responsible for ensuring the vehicle’s position inside the lane mark- 
ings on a road. Skills shared by both skill graphs are highlighted with both shades. 


3 https: //github.com/TUBS-ISF/Skeditor 
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Skill Requirement 

Follow hard shoulder Vehicle deviates from the center of the lane by at most 
half the lane width 

Control lateral dynamics Lateral controller must guarantee overshoot of less than 
25cm 

Yaw Vehicle yaw rate must not exceed 0.3rads~+ 

Control longitudinal dynamics Vehicle speed must not exceed 2.7ms_! 

Accelerate Acceleration must not exceed 4ms~ 

Decelerate Vehicle must at least provide a deceleration of 5 m s7? 

Keep distance to leading vehicle Vehicle must keep a minimum distance of 10 m to leading 
vehicle 

Select target object Object recognition must always select an object of lateral 
position of x > 10m 

Perceive movable objects Object recognition must track vehicles of relative speeds 


between 0 and 60ms~+ 


Estimate angle and distance to marking Angle to lanemarking must be extracted with maximum 
error of +0.5 degrees and distance to lanemarking must 
be extracted with maximum error of +3cm 


Perceive hard shoulder Image processing must extract right edge of shoulder 
marking with a maximum error of 20cm 


Estimate motion Vehicle velocity must be estimated with a maximal error 
of £0.03ms~! 


Table 2: Specified safety requirements for the vehicle follow mode as adopted 
from Nolte et al. [26]. 


The overall procedure Follow mode (i.e., the composition G o G2) requires 
a combination of autonomously following a leading vehicle (i.e., skill s1) and 
following the lane marking (i.e., skill s13). The informal safety guarantees for the 
skill graph of our case study are adopted from Nolte et al. [26] and illustrated in 
Table 2. Requirements are typically given informally, which is why we translated 
them to their formal counterpart. For our case study, we focus on four particular 
skills, as these are the only non-trivial skills in our case study that comprise both, 
the vehicle’s dynamics and a control algorithm. Namely, these skills are Control 
longitudinal dynamics (sg), Control lateral dynamics (s14), Follow hard 
shoulder (s13), and Keep distance to leading vehicle (s1). 


Example 2 Consider skill Control longitudinal dynamics (se) in the con- 
text of the overall skill graph. Skill sg comprises the dynamic system for the lon- 
gitudinal motion of the vehicle while depending on skills Accelerate and Decel- 
erate as well as the perception skill Estimate motion. The control algorithm 
of this skill as part of the hybrid program complies with the safety requirements 
as given in Table 2 (e.g., velocity (vs) must not exceed 2.7ms~'). Preconditions 
for this skill are propagated from skills Estimate motion, Accelerate, and 
Decelerate, and guarantee that the vehicle provides a maximal deceleration of 
5ms ? (B) and a maximal acceleration of 4ms~? (A). Table 3 summarizes all 
attributes of skill se. 
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Xaet = {@, V, Vmax} 
Xreq = {a, A, B, ep, t} 
init — [(ctrl; dyn)*](guar) 
init=v>O0Av< Um AN A>OAA<4AB2>5A Umax = 2.7 
ctrl = (?umx — v < margin);a = *;-B<a<0; 
oe U?Ugae — v > margin); a = *;-B < a < A; ) 


margin = ep * A 
dyn =t:=0;2' =v,v a,t' =1&v >0At<ep 


guar = V < Umax 


T = action 


B = {vs < Umax (2.7ms*)} 


Table 3: Attributes for skill Control longitudinal dynamics (se). 


5.3 Results 


All measurements were conducted on an Intel i7-6600U CPU @ 2.60GHz with 
12 GB RAM and Z3 [12] in version 4.6.0 was used as the underlying solver for 
KEY MAERA X in version 4.7.3. 


RQ-1: How does the skill-based methodology compare to monolithic 
modeling and verification? We modeled the overall behavior of G1 o Gz as a 
monolithic model (i.e., following the hard shoulder and following a leading vehicle 
in concert) as described in Section 4.2. As mentioned before, the skill-based 
approach has a high reuse potential. Each skill needs to be verified only once, 
and the verification results can be reused in other skill graphs (cf. Theorem 1). 
While in case of a change of parameters or an update of control algorithms 
the monolithic model has to be re-modeled and re-verified completely, a change 
impact analysis identifying only affected skills may reduce the re-verification 
effort even further for the compositional approach. Importantly, skill s;3 and 
the monolithic model could only be verified interactively, whereas skills s1, $14, 
and sg were verified fully automatically with the automatic proof search of 
KEYMAERA X. Chances of an automatic re-verification are thus higher with 
the skill-based methodology. 

An important hypothesis of ours is that skill-based verification is more ef- 
fective in discovering modeling defects compared to a monolithic model. To get 
some insights into this hypothesis, we developed three initial experiments to ren- 
der the verification attempt invalid. We (1) changed the safety guarantee of skills, 
(2) changed the control algorithm of skills, and (3) did a combination of both 
and compared these results to the same changes performed in the monolithic 
model. Following our methodology helped to trace and resolve defects effectively 
with respect to this case study, whereas identifying multiple modeling defects 
in the monolithic model became quickly intractable. During the resolution of 
Scenario 3, re-verification had to be performed several times for the monolithic 
model (i.e., resolving one conflict at a time), which emphasizes the advantage of 
our compositional approach over the monolithic modeling. However, we do not 
want to overclaim the importance of our insights, as more complex experiments 
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Verified Skill Proof steps 
sı 86 $13 $14 sı S6 $13 S14 a reuse 
fen [o [L 4,746 3,769 8,515 8,515 
G2 V Ae 3,769 16,924 7,223 27,916 24,147 
GioGa VV KY x 4,746 3,769 16,924 7,223 32,662 o* 
total 69,093 32,662 


*No re-verification of skills with Theorem 1 


Table 4: Comparison of the verification effort for skill-based compositional veri- 
fication. 


and a larger evaluation have to be conducted to adequately test whether our hy- 
pothesis is significant. 


RQ-2: To what extent can skill-based compositional verification reduce 
the verification effort? To answer RQ-2, we measured the verification effort 
in proof steps for each of the three skills mentioned before per skill graph. In 
Table 4, we summarize the results. Column Verified Skill describes which skill is 
part of which skill graph and column Proof steps compares the number of proof 
steps needed for each skill individually. A common scenario is to model and 
verify each maneuver individually (i.e., each skill graph). The total verification 
effort Xtotaı would then cumulate to 69,093 proof steps. Instead, our skill-based 
approach allows to reuse verification results for skill sg in skill graph G2 and per 
Theorem 1 even the verification results for all skills in skill graph G1 0G. Entries 
highlighted in gray indicate that the respective skill could be reused instead of 
re-verification. The compositional approach needs approximately 53% less proof 
steps in our case study. 


6 Related Work 


Skill Graphs. Maurer [23] pioneered the concept of skills by introducing so- 
called abilities in vehicle guidance systems. Abilities are similar to skills, as they 
concisely describe the capabilites of a vehicle, and are intended to be perma- 
nently monitored at run-time to enforce safety mechanics. Reschka et al. [32,33] 
introduced skill graphs informally in their work giving definitions for skills and 
abilities in relation to autonomous vehicles. Nolte et al. [26] built upon this ap- 
proach by employing the informal concept of skill graphs for the development of 
self-aware automated road vehicles. We adopted their case study to evaluate our 
skill-based verification approach. 


Hybrid Systems and Verification of Cyber-Physical Systems. Hybrid 
systems [3] are a generalization of timed automata [4] and well-suited for mod- 
eling and verifying cyber-physical systems. Krishna et al. [20] show that using 
hybrid automata to model and verify cyber-physical systems is, in principle, fea- 
sible. Typically, hybrid systems are verified employing reachability analyses and 
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model checking [2, 21, 22,35]. However, these technqiues are not compositional 
in general (i.e., modular verification of individual parts to establish correctness 
of the entire systems is not possible). It is also not intended to generate and 
reuse proofs to increase trust in the system’s correctness, as, for instance, possi- 
ble with theorem proving. To address this issue, we built our methodology upon 
the notion of hybrid programs [30] and the theorem prover KEYMAERA X [14], 
which helped us to also satisfy the important property of compositionality in the 
modeling and formal verification of hybrid systems. We further extend this con- 
cept with skill graphs by modularizing the verification of complex driving tasks, 
such that the verification of the entire behavior is reduced to simpler sub-tasks 
and compatibility checks. 

Finally, there exists a seamless connection to the work conducted by Müller 
et al. [24], who present a compositional component-based approach for the ver- 
ification of hybrid systems based on hybrid programs. Skill graphs provide an 
abstract and organized view of the system and are applied (1) in the verifica- 
tion and validation phase of the requirements analysis and (2) the early stages 
of the design phase. Subsequently, a skill may be implemented by a set of mul- 
tiple interacting components to take more necessary specifics into account, such 
as communication protocols and resource consumption. To conclude, the process 
of refining skill graphs including their safety requirements to formally specified 
component-based systems exhibits a high level of quality assurance at the level 
of both, requirement engineers and software architects. 


7 Conclusion and Future Work 


In this work, we proposed skill-based verification of cyber-physical systems with 
the notion of skill graphs that (1) encourages the modular development of small 
and reusable actions in isolation, and (2) enables the identification of poorly 
defined requirements in early software development processes by considering 
formal verification of hybrid systems. We provide the first formalization of skill 
graphs, showed how skill graphs and hybrid programs can be combined, and also 
introduced a proved notion of compositionality for skills. The investigated case 
study on a vehicle follow mode showcases that the compositionality property of 
skill graphs is important for scaling, as the verification effort is reduced by more 
than 53%. Compositionality is particularly important for model and software 
evolution, as costly re-verification of a skill’s requirement can be minimized. 

For the future, we want to enable the composition of skills with dissimilar 
hybrid programs, for which the theoretical groundwork partially exists. Moreover, 
our current focus is on the integration of skill graphs into software engineering 
practices for cyber-physical systems to amplify the utilization of formal methods 
from the start of new software projects. 
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Abstract. There is a growing need for the automated generation of in- 
stance models to evaluate model-driven engineering techniques. Depend- 
ing on a chosen application scenario, a model generator has to fulfill 
different requirements: As a modeling language is usually defined by a 
meta-model, all generated models are expected to conform to their meta- 
models. For performance tests of model-driven engineering techniques, 
the efficient generation of large models should be supported. When gen- 
erating several models, the resulting set of models should show some 
diversity. Interactive model generation may help in producing relevant 
models. In this paper, we present a rule-based, configurable approach 
to automate model generation which addresses the stated requirements. 
Our model generator produces valid instance models of meta-models with 
multiplicities conforming to the Eclipse Modeling Framework (EMF). An 
evaluation of the model generator shows that large EMF models (with up 
to half a million elements) can be produced. Since the model generation 
is rule-based, it can be configured beforehand or during the generation 
process to produce sets of models that are diverse to a certain extent. 


Keywords: Model generation - Model transformation - Eclipse Model- 
ing Framework (EMF) 


1 Introduction 


The need for the automated generation of instance models grows with the steady 
increase of domains and topics to which model-driven engineering (MDE) is 
applied. In particular, there is a growing need for large instances of a given 
meta-model [14,26]. As most of the available MDE tools are based on the Eclipse 
Modeling Framework (EMF) [34], instances should be conformant to EMF. 
Depending on the chosen application scenario, a model generator has to ful- 
fill different requirements: As a modeling language is usually defined by a meta- 
model, all generated models are expected to conform to their meta-models. For 
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performance tests of model-driven engineering techniques, the efficient genera- 
tion of large models should be supported. When several models are generated, 
they should show some diversity. Interactive model generation may help in pro- 
ducing relevant models. While there are several tools and approaches to instance 
model generation in the literature, e.g. [15,16,30,32,36], we are not aware of any 
tool satisfying all the requirements stated above. Two extreme approaches are 
the following: The approach in [16] is very fast but does not address any mod- 
eling framework and provides very few guarantees concerning the properties of 
the generated output models. As EMF has developed to the de-facto standard 
for modeling in MDE, respecting the EMF constraints is crucial to guarantee 
the usability of the resulting models in practice for processing them by other 
tools, e.g., for opening them in standard editors. On the contrary, solver-based 
approaches such as [15,32,36] provide high guarantees by generating instance 
models that even conform to additional well-formedness constraints (expressed 
in, e.g., OCL [20]), but they suffer from severe scalability issues. 


We suggest finding a good trade-off between having a scalable generation 
process for models and generating well-formed models. In this paper, we pro- 
pose a rule-based approach to the generation of models which has the following 
distinguishing features: (i) To guarantee interchangeability, generated models 
conform to the standards of EMF. In particular, this means that the contain- 
ment structure of a generated model forms a tree. (ii) Generated models exhibit 
a basic consistency in the sense that they conform to the structure and the mul- 
tiplicities specified by the meta-model. (iii) The generation of models can be 
configured to obtain models that are diverse to a certain extent. (iv) The im- 
plementation is efficient in the sense that instance models with several hundred 
thousand elements can be generated. (v) The approach is meta-model agnostic 
and customizable to a given domain-specific modeling language (DSML) in a 
fully automated way. (vi) It is possible to generate models in a batch mode or 
interactively to somewhat guide the generation process towards relevant mod- 
els. User interaction includes the setting of seed models as well as interactively 
choosing between alternative generation strategies. 


Our rule-based approach to model generation consists of two main tasks: 
(1) The meta-model of a given modeling language is translated into a rule- 
based model transformation system (MTS) containing rules for model genera- 
tion. (2) These rules are consecutively applied to generate instance models. This 
generation process may be further configured by the user. Especially, a poten- 
tially inconsistent model may be used as a seed for generating valid models. 


Our approach is implemented in two Eclipse plug-ins: A meta-tool, called 
Meta2GR, automatically derives the MTS from a given meta-model. A second 
plug-in, called EMF Model Generator, is automatically configured with the re- 
sulting MTS. A modeler uses the configured model generator, which takes ad- 
ditional user specifications and an optional seed EMF model as inputs and gen- 
erates a valid EMF model. We argue for the soundness of our approach and 
evaluate its scalability by generating large, valid EMF models (up to half a mil- 
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lion elements). Furthermore, we show how to generate a set of models that are 
diverse to some extent. 


2 Related Work 


In our discussion of related work, we focus on generic approaches and discern 
between solver-based, tableaux-based and rule-based generic approaches. We omit 
language- and application-specific approaches (like, e.g. [7,10]). 


2.1 Solver-Based Approaches 


Solver-based approaches generate models by (i) translating a meta-model into 
a logical formula, (ii) using an off-the-shelf solver to find possible solutions, 
and (iii) translating back the found solutions into instances of the meta-model. 
In most cases, solver-based approaches are capable of generating models that 
respect well-formedness constraints such as OCL constraints since these can be 
translated into the logical formula as well. The approaches presented in [15,32,36] 
use Alloy [12] for this purpose. Although we do not see any general limitation for 
them to be applied to arbitrary meta-models, the translations to Alloy presented 
in [15,36] target dedicated domain-specific languages. The language-independent 
translation presented by Sen et al. [32] is not fully automated. Performed evalua- 
tions show that the scalability of using an off-the-shelf solver is limited to pretty 
small models. 


2.2 A Tableaux-Based Approach 


Schneider et al. [27] present an automated approach for the generation of sym- 
bolic attributed typed graphs fulfilling a given set of first-order constraints. The 
approach is based on a tableaux calculus for graph constraints. It produces min- 
imal symbolic models encoding (infinitely) many instances that fulfill the set of 
constraints. While this is highly desirable to get an overview of possible instance 
structures, retrieving large graphs from symbolic instances is not directly sup- 
ported. Moreover, the work does not aim at EMF; it is also not possible to add 
the EMF constraints as not all of them are first-order. The authors extend their 
work in [28] to be able to also repair given instances. This model repair can be 
used to support the generation of instances from a given seed model. The applied 
repair strategy does not incorporate any deletions of model elements. 


2.3 Rule-Based Approaches 


Ehrig et al. [9] present an approach for converting type graphs with restricted 
multiplicity constraints into instance-generating graph grammars. Taentzer gen- 
eralizes that approach in [37] to arbitrary multiplicity constraints. Both ap- 
proaches are presented for typed graphs, which means that containment edge 
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types and other EMF constraints are not considered. Moreover, there is no im- 
plementation of these approaches. 

Radke et al. [24] present a translation of OCL constraints to graph constraints 
which can be integrated as application conditions into a given set of transforma- 
tion rules [17]. The resulting rules guarantee validity w.r.t. these constraints but 
might be rendered inapplicable. The work is motivated by instance generation; 
however, no dedicated algorithm is presented. 

Another grammar-based approach is presented by Mougenot et al. [16]. By 
reducing models to their containment structure, a tree grammar is derived from 
that meta-model projection. For a given size (representing the number of nodes), 
the method is capable of uniformly generating all tree structures of that size. 
Similarly, the tool EMF random instantiator [11] considers containment edges 
only. While both approaches are highly efficient, reducing models to their con- 
tainment structure is a severe oversimplification in practice. 

The frameworks RandomEMF presented by Scheidgen [26] and EMG pre- 
sented by Popoola et al. [23] aid users to manually specify a generator that 
automatically generates models. These frameworks do not offer any help, how- 
ever, to ensure that the generated models conform to the meta-model and that 
the generated models satisfy the required constraints. 

The SiDiff model generator (SMG) has been proposed by Pietsch et al. [22]. 
It takes an existing model as input and manipulates it by applying model editing 
operations, configured by a stochastic controller. On the meta-level, the SMG was 
integrated into the approach and tool presented by Kehrer et al. [13,25], which 
generates a complete set of consistency-preserving edit operations for a given 
meta-model. It supports meta-models with somewhat restricted multiplicities, 
however. Generated edit operations can be applied to valid models only. Its 
stochastic controller has been designed to generate sequences of models that 
mimic realistic model histories [38]. The generated models are, on purpose, very 
similar to each other, i.e. they lack diversity. 


2.4 A Hybrid Approach 


A hybrid approach is implemented within the VIATRA Solver [29,30]: Rules 
are used to generate an instance model from scratch or a seed model. A solver 
is used to guarantee validity concerning additional well-formedness constraints. 
During the generation process, a partial model is extended using rules. This 
partial model is continuously evaluated w.r.t. the validity of these constraints 
using a 3-valued logic [31]. By under-approximation, the search space is pruned 
as soon as the partial model cannot be refined into a valid model. The evaluation 
of constraints is performed with a specifically developed solver or an off-the-shelf 
one. All resulting instance models fulfill the additional constraints and conform 
to EMF. Moreover, the VIATRA Solver has been investigated successfully for 
generating diverse and realistic models. While experimental results indicate that 
the approach is 1-2 orders of magnitude better than existing approaches using 
Alloy, the authors also mention that the scalability of their approach is not yet 
sufficient [30,29]. 
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Table 1. Summary of selected generic approaches to model generation w.r.t. important 
characteristics we aim at in this paper. 


Input Output Algorithm 

Category Approach impl. ex. seed EMF wf config. interact. scal. 
Solver Sen et al. [32] + — o F+ = = = 
Tableaux Schneider et al. [27,28] + o +++ ? 
Rule-based Taentzer [37] H+ o + ? 

Mougenot et al. [16] o — o + = 

Pietsch et al. [22] + o + + + + o 
Hybrid Semeráth et al. [30] + o oe o 


Rule-based Our approach } H+ H H 


2.5 Need for Further Research 


We summarize the related work through selected approaches from all categories 
in Table 1 w.r.t. important characteristics. First, we indicate whether the ap- 
proach is implemented in a tool (column 1). Second, we are interested in ma- 
nipulating an existing seed model (column 2), e.g., for the sake of generating 
model evolution scenarios. Here, o indicates that only special kinds of seeds are 
possible. Third, concerning the consistency level of generated output models, we 
are interested in the conformance with EMF (column 3) and additional well- 
formedness constraints, including multiplicities (column 4). Here, + indicates 
partly and ++ full support of multiplicity constraints, whereas + + + means 
support of more general well-formedness constraints. Fourth, we are interested 
in the properties of the generation algorithm itself, which should be configurable 
(column 5), offer interaction possibilities (column 6), and be scalable (column 7) 
in order to support the generation of diverse and large instances, respectively. 
None of the generic approaches to model generation fully meets all criteria. 
Given a meta-model with multiplicities as the only well-formedness constraints, 
we are heading towards a model generator that supports all quality attributes. 


3 Running Example and Preliminaries 


This section presents our running example and preliminaries. After introducing 
the running example, we recall the Eclipse Modeling Framework (EMF), rule- 
based model transformation and a rule-based approach to model repair that we 
utilize for our approach to instance generation. 


3.1 Running Example 


As running example we use an excerpt of the GraphML meta-model [3] as shown 
in Fig. 1. GraphML [6] is a file format for different kinds of graphs; it separates 
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Fig. 1. Excerpt of the GraphML meta-model 


[0..*] graphs 


the graph structure from additional data. We use this example to illustrate how 
our rule-based approach generates instances from a given meta-model. 


3.2 The Eclipse Modeling Framework 


The Eclipse Modeling Framework (EMF) [34] has evolved into a de-facto stan- 
dard technology for defining models and modeling languages. In EMF, meta- 
models are defined using Ecore, an implementation of the OMG’s EMOF stan- 
dard [21]. Meta-models in Ecore prescribe the structures that instance models of 
the modeled domain should exhibit. Concepts known from UML class diagrams 
are used, namely the classification of objects and their attributes, references to 
objects, and constraints on object structures. References may be opposite to each 
other and constrained by multiplicities. A specific kind of references are contain- 
ments. The conformance of an instance model to a meta-model can formally be 
expressed using typed attributed graphs with inheritance [4]. EMF models have 
to fulfill the following constraints: 


— At-most-one-container: Each object must not have more than one container. 

— No-containment-cycles: Cycles of containments must not occur. 

— No-parallel-edges: There are no two references of the same type from the same 
source to the same target object. 

— All-opposite-edges: If reference types t1 and t2 are opposite to each other: For 
each reference of type t1, there has to be a reference of type t2 linking the 
same objects in the opposite direction. 

— Rootedness (optional): There is an object, called root object, that contains all 
other objects of a model directly or transitively. 


In the sequel, we use the terms EMF model and instance model interchangeably. 
Each model conforming to its meta-model and fulfilling the EMF constraints 
listed above is called EMF model. If the meta-model’s multiplicities are fulfilled 
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in addition, the model is called valid. Since we use a graph-based approach to 
model transformation in the following, objects are often also called nodes and 
object references are called edges. 


3.3 Transformation Rules and Transformation Units 


Our model generation approach is based on the application of transformation 
rules to EMF models as implemented in the Eclipse plug-in Henshin [1,35]. This 
approach is formally underpinned by typed attributed graph transformation as 
presented in [4]. 

A (non-deleting) transformation rule consists of two model patterns, namely 
a left-hand side L and a right-hand side R where L is a sub-pattern of R; we 
denote such a rule by L > R. All elements in R \ Z shall be created. A rule can 
be equipped with negative application conditions (NACs) [8]. Each NAC N is an 
additional pattern that includes L. All elements in N \ L are forbidden to exist. 
An application of a transformation rule to a model M amounts to finding the 
pattern L in M and, if such a match is found, creating a copy of R\ L there. A 
rule is applicable at a match only if this match cannot be extended to a match 
for any of the NACs. 

In Henshin, rules are specified in an integrated form where elements are an- 
notated and colored according to their roles. While a created element is depicted 
in green, a forbidden element is shown in blue. Besides, it may be equipped with 
the name of the NAC it belongs to for distinguishing several NACs. For example, 
the rule insert_additionalEdge_targetport in Fig. 7 matches nodes of types Edge 
and Port and inserts an edge of type targetport between them but only if such 
an edge does not already exist and the selected Edge does not already refer to 
another Port. 

To construct more complex transformations in Henshin, rules may be com- 
posed in (transformation) units. Units may have parameters that can be passed 
to contained units or rules. A ‘?’ indicates that the parameter may be randomly 
chosen. We sketch the semantics of those units which we use in the following. 
Note that each rule is already considered as a unit. 


— An independent unit comprises an arbitrary number of sub-units that are 
checked for applicability in a non-deterministic order. One applicable unit is 
executed. 

— A loop unit comprises one sub-unit and executes it as often as possible. 

— A conditional unit comprises either two or three sub-units specifying the if-unit, 
the then-unit, and optionally, the else-unit. If the if-unit is executed successfully, 
the then-unit is executed. Otherwise, if defined, the else-unit is executed. 

— A sequential unit comprises an arbitrary number of sub-units that are executed 
in the given order. If a sub-unit is not applicable, it is skipped and the 
execution continues with the next sub-unit. 

— A priority unit comprises an arbitrary number of sub-units that are checked 
for applicability in the defined order. If a sub-unit is executed successfully, 
the check and execution of the following sub-units are skipped. 
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3.4 EMF Repair 


Our generation process of instance models uses the repair process for EMF in- 
stance models presented in [19]. The basic approach is to derive repair rules from 
a given meta-model. The derived rules allow to first trim the model such that 
no upper bound is violated any longer. Subsequently, it completes the model by 
adding nodes and edges until no lower bound is violated. The rules are designed 
such that, during the completion phase, no upper bound violation is introduced 
and that both phases terminate only if no violation of multiplicities occurs any 
longer. We formally proved these properties in [18]. While this process does not 
necessarily terminate, its termination has been proven for instance models of 
fully finitely instantiable meta-models. A meta-model is called fully finitely in- 
stantiable (f.f.i.) if, for every given finite EMF-model M that instantiates it and 
respects upper bounds but may violate lower bounds, there exists a finite and 
valid EMF-model M” such that M is a submodel of M”. 


4 Rule-Based Instance Generation 


We start this section with an overview of our approach to the generation of 
valid EMF models. Thereafter, we present the kinds of generation rules that 
are derived from a given meta-model, introduce four parametrization strategies 
for generation processes, and show possibilities of user-interaction. Finally, we 
discuss the limitations of our generation approach and the formal guarantees 
that have been shown. 


4.1 Overall Approach 


Our overall approach to instance generation is depicted in Fig. 2. The funda- 
mental idea behind our approach is to base model generation as far as possible 
on rule-based model repair using the tool EMF Repair [19]. All rules needed 
to perform model generation steps are automatically derived from the given 
meta-model by the meta-tool Meta2GR. If a non-empty seed model is given, 
the model generation process starts with checking it for upper bound violations 
and potentially trimming it using EMF Repair (model trimming). Thereafter, 
the EMF model is extended with object nodes and references without violating 
upper bounds using the rules derived by Meta2GR (model increase). The result- 
ing model shall meet user specifications w.r.t. its size which will be discussed in 
more detail in Sect. 4.3 below. In the next step, the EMF model is completed to 
a valid EMF model, again using EMF Repair (model completion). As this repair 
process adds elements only, the user specifications are still met by the resulting 
model. Moreover, the result is guaranteed to be a valid EMF model [18]. EMF 
Repair is also used to set attribute values, either randomly or using user input 
which is provided in a JSON-file. 
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Fig. 2. Rule-based EMF Model Generator 


4.2 Generated Rules for Model Generation 


Given a meta-model, different kinds of rules are derived for generating EMF 
models. They are listed in Table 2. The derived rules are needed to perform the 
following tasks: (i) creation of nodes, (ii) insertion of non-containment edges, 
and (iii) checking for the existence of source or target nodes for an edge of a 
certain type. All rules that create model elements (i.e., the rules of kinds (i) and 
(ii)) are generated with NACs to not introduce upper bound violations during 
generation. Moreover, they all are consistent transformation rules in the sense 
of [4]. This means that they preserve consistency w.r.t. the EMF constraints 
including rootedness (compare [4, Theorems 1 and 2]). For example, our rules 
cannot introduce containment cycles or parallel edges by design. 


Table 2. Overview of rule kinds used for model generation 


Role Kind Semantics 


Create Additional-node-creation Create a node of a certain type and insert it into 


node rules one of its direct containers 
Transitive-node-creation Create a node of a certain type and insert it into 
rules one of its transitive containers 


Create Additional-edge-creation Create an edge of non-containment type be- 
edge rules tween two nodes 


Check Additional-edge-checking Check if possible source and target nodes exist 
edge rules for an edge of a certain type 
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Fig. 3. Rule schema for transitive-node-creation rules (of length 2) 


Node creation (i) is performed by two sets of rules, additional-node-creation 
rules and transitive-node-creation rules. The latter ones are described as follows: 
For every concrete node type in the meta-model, every possible incoming path 
over containment edges is computed such that each containment type occurs 
maximally once. For each such path, a rule is derived that matches the node 
where this path starts and creates the rest of this path. Each rule is equipped 
with a NAC ensuring that no upper bound violation can be introduced. An ex- 
ample schema of length 2 for this kind of rule is depicted in Fig. 3. The lower part 
of Fig. 6 depicts all transitive-node-creation rules that are derived for the type 
port. Only one rule is equipped with a NAC as the edge type subgraph is the only 
one with an upper bound (of 1). In EMF, if a containment edge has an opposite 
edge, the upper bound of the opposite edge must be 1. If a containment edge is 
created, the opposite edge is created automatically. Therefore, we do not repre- 
sent it here. Additional-node-creation rules are transitive-node-creation rules of 
length 1. We derive both kinds of rules for different parametrizations of our gen- 
eration algorithm which are introduced in Sect. 4.3. The rule add_in_Node_a_Port 
in Fig. 6 is an example derived for the containment edge type ports. It does not 
have a NAC since the upper bound of ports is unlimited. 


To create non-containment edges (ii), additional-edge-creation rules are gen- 
erated. The general schema for these kinds of rules is depicted in Fig. 4. For each 
non-containment edge type, a rule is derived that matches the source and the 
target nodes suitable to this edge type and creates an edge of the corresponding 
type. Again, a NAC prevents that an upper bound is violated (NACn). A second 
NAC prevents that parallel edges are introduced (NACp). If the given edge type 
has an opposite edge type, the opposite edge is created as well and its upper 
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Fig. 4. Rule schema for additional-edge-creation rules 


bound is considered accordingly (NACI). A concrete example for the edge type 
targetport is the rule insert_additionalEdge_targetport as depicted in Fig. 7. 

As non-containment edges may be added optionally according to user spec- 
ifications ( in Sect. 4.3), it is necessary to check if nodes of certain types exist 
and can serve as source or target nodes of an additional edge without violat- 
ing the upper bounds of the respective edge type (iii). This check is performed 
with additional-edge-checking rules which are derived for non-containment edge 
types. The general schema is depicted in Fig. 5. Such a rule is applicable if and 
only if there exists a source node where the upper bound of the edge type is not 
yet reached. The same kind of rule is derived for the target node type as well. 
The rule check_proper_sourceNode_for_targetport in Fig. 7 is a concrete example 
for the edge type targetport. 
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Fig. 5. Rule schema for additional-edge-checking rules 


4.3 Generation Strategies: Parameterization 


Since we use a rule-based approach, the model generator can be parameterized 
w.r.t. a given user specification. In the following, we present four strategies for 
generating models w.r.t. user specifications; they serve to specify the model 
increase phase of the generation process. The models resulting from this phase 
conform to EMF and meet the user specification but may violate lower bounds. 
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Fig. 6. Independent unit for randomly creating a containment tree containing a fixed 
number of nodes of type Port 


They are used as input for the model repair algorithm of EMF repair to obtain 
a valid EMF model. The user may (1) specify the number of elements that is to 
be created minimally, (2) specify a node type and the number of nodes of this 
type that is to be created minimally, (3) specify an edge type and the number 
of edges of this type that is to be created minimally, or (4) combine the above- 
mentioned strategies sequentially in arbitrary order. If the user has not specified 
any model as a seed, the generation is initialized by creating a root node. 


Adding elements of arbitrary types. In this strategy, the user specifies the 
minimum of model elements (i.e., nodes and edges) to be created. The idea 
behind this strategy is to randomly execute a set of rules for adding nodes 
and edges of arbitrary types without violating the corresponding upper bounds 
and the EMF constraints. Hence, all rules of kinds additional-node-creation and 
additional-edge-insertion are collected into an independent unit which is applied 
as often as the user specification requires. While the independent unit is imple- 
mented in Henshin using a uniform distribution, this strategy may also be per- 
formed using other distributions by, e.g., leveraging a stochastic controller [38]. 


Adding nodes of a specific type. In this strategy, the user specifies a node 
type and the minimum number of nodes of this type that shall be created. This 
strategy is implemented as an independent unit containing all transitive-node- 
creation rules for the specified node type being applied as often as the user has 
specified. An example unit for the node type Port is given in Fig. 6. 


Adding edges of a specific type. In this strategy, the user specifies a (non- 
containment) edge type and the minimum number of edges that shall be created 
of this type. This strategy is similar to the previous one, thus its basis is a unit 
that contains the additional-edge creation rule for the specified type. If this rule 
is not applicable, however, a source or a target node (or both) for an additional 
edge of that type is missing. The additional-edge-checking rules for this edge type 
are used to detect such situations. Then, corresponding transitive-node-creation 
rules for the type of the missing node are used to create the missing source 
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Fig. 7. Units for inserting a fixed number of edges of type targetport 


and/or target node(s). This strategy is implemented as a priority unit where the 
first contained unit is the additional-edge-insertion rule. Its second contained 
unit is a sequential one with two conditional units checking for missing source 
or target nodes, respectively, and creating corresponding nodes if needed. 

Figure 7 presents a priority unit using this strategy at the example of 
the targetport-edge. The first level contains the rule insert_additionalEdge.... 
The second level is the sequential unit add_proper_source_target_Node...: 
The conditional unit check.add_proper_sourceNode... uses the rule 
check_proper_sourceNode... in the if-statement. The then-statement is set to true 
whereas the else-statement is configured with a priority unit add_treeNode_Edge 
which adds an Edge-node respecting upper bounds and the EMF constraints. 
The conditional unit adding a missing target node is defined analogously. 


Sequential combination of strategies. As our approach allows for an arbi- 
trary seed model as input, the result of applying one strategy can be used as 
input for applying the second one. This allows for arbitrary sequential combina- 
tions of strategies. 


4.4 User Interaction 


Since our approach is rule-based, it is also possible to allow for user interaction. 
Instead of random rule applications at random matches, the available rules and 
matches can be presented to the user for selecting at which match a rule has 
to be applied and how many times. That is promising for generating different 
tree structures of various weights. While it may not desirable to completely 
generate large models in such a way, a hybrid strategy can be applied to utilize 
the selection process, e.g., by employing heuristic data. EMF Repair already 
supports this kind of user interaction. 
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4.5 Limitations and Formal Guarantees 


Limitations. A user may only specify the minimum number of desired elements; 
the specification of a maximum number is not yet supported within our ap- 
proach. Although the generation process applies the respective rules exactly as 
often as specified during the model increase phase, some of the rules create more 
than one element and additional model elements may be created to repair viola- 
tions of lower bounds during the consecutive model repair. Moreover, we cannot 
guarantee that the user specification is fully met since necessary rules may not 
be applicable as often as specified and backtracking is not used. Even if the 
specification could be met in principle, it may happen that the specific selection, 
order, and matches of rules do not succeed as they are randomly chosen in the 
current version of the approach. By counting created elements, it can always 
be decided whether a user specification has been met, and thus, the user can 
be informed. In our experiments (in Sect. 6), every generated output meets the 
selected specifications. Thus, while more research is needed to precisely evaluate 
the severity of our limitations, the performed experiments are positive evidence 
that these limitations are rather small even for reasonably complex meta-models. 


Formal guarantees. In case of termination, our approach guarantees a valid EMF 
model as output: All generation rules conform to a design that is proven to 
preserve EMF constraints in [4]. Moreover, applications of these rules cannot 
introduce violations of upper bounds as they are equipped with corresponding 
NACs. So each strategy mentioned above is guaranteed to result in an instance 
model that conforms to EMF and does not violate any upper bounds. Moreover, 
it is ensured by the finite number of rule calls specified in each strategy that the 
increase phase terminates. Thus, suitable input for the model completion process 
of EMF Repair [19] is ensured after finitely many steps. For model completion, 
termination was proven in the case of f.f.i. meta-models while correctness was 
proven in all cases in [18]. If the user specification is met after a model has been 
increased, it is met after model completion as well since no deletion takes place 
during model completion. Even an increased model that does not meet the user 
specification is an EMF model and hence a suitable input for EMF Repair. Thus, 
it can be completed and returned to the user as a valid EMF model. The given 
user specification, however, is only partly satisfied in this case. 


5 Tooling 


We have developed two Eclipse plug-ins that are available for download.’ The 
first plug-in is a meta-tool, called Meta2GR. It takes a domain meta-model as 
input and derives an MTS in Henshin. This is achieved by applying the meta- 
patterns that are depicted in Figs. 3 to 5 to the given domain meta-model. These 
meta-patterns are specified as rules typed over the Ecore meta-metamodel. Based 
on their matches, domain-specific model generation rules of different kinds are 


3 https: //github.com/RuleBased A pproach/EMFModelGenerator /wiki 
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created. For a given meta-model, the MTS has to be generated only once. The 
second Eclipse plug-in, called EMF Model Generator, is a modeling tool that 
uses the derived MTS to generate instance models. Given a user specification 
and, optionally, one or more seed EMF models, this model generator creates 
valid EMF models in batch mode or incrementally. 


6 Evaluation 


Next to the formal guarantees which are provided by construction, we empirically 
evaluate our approach w.r.t. the following research questions: 

RQ 1: How fast can instance models of varying sizes be generated? 

RQ 2: Does the use of parametrization help to increase the diversity? 
All experiments were performed on a desktop PC, Intel Core i7, 16 GB RAM, 
Windows 7 x64 using Eclipse Oxygen. Our Eclipse-based tool was configured 
to use the default settings, e.g., the heap size was limited to 1 GB. All the 
evaluation artifacts are available for download.? 


6.1 Scalability Experiments 


To answer RQ 1, we conducted two scalability experiments. We used 8 meta- 
models taken from the literature and projects, namely the Statechart meta-model 
of Magicdraw [13], Web model [5], Car Rental and Class model [2], Bugzilla, 
Latex, Warehouse, and GraphML (GML) [3]. The average size of the meta- 
models is 44 elements (16 nodes, 17 edges, 11 attributes) and the number of 
multiplicity bounds is 24 on average. The overhead for generating the needed 
transformation rules and units was, on average, less than 5 seconds, and we will 
thus focus on the run-time of the model generation in the sequel. 


Experiment 1. In the first experiment, we randomly generated valid EMF models 
of varying sizes up to 10000 elements (counting nodes and edges) for each meta- 
model using Strategy (1) (in Sect. 4.3). For each size category, we generated 
10 valid EMF models and calculated the average run-time. Table 3 presents the 
results of this experiment. Considering all the meta-models and generated models 
of varying sizes, our tool always generates a valid EMF model with at least 
10000 elements. Generation times were fastest for the Bugzilla meta-model and 
slowest for the GraphML one. To assess how robust the times are, we measured 
the time for generating a seed and for the subsequent repair separately. For each 
one, we also computed the corrected standard deviation (which is presented 
for model size 10000 only). Generating the seed is generally faster than the 
subsequent repair, except for the StateChart and Warehouse meta-models. If 
the standard deviation is rather high, this tends to be the case for both, the 
seed generation and the repair (as for GraphML, Web Model, and Class Model). 
A closer inspection of the meta-models shows that higher run-times, as well as 
higher deviations of run-times, are caused by larger meta-model sizes (and hence 
larger sizes of derived MTSs) and higher numbers of interrelated multiplicity 
constraints. 
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Table 3. Average run-time (in seconds) for generating valid EMF models of varying 
sizes for 8 meta-models (MM) using Strategy (1); for size 10000, run-time is split into 
the generation of seed and subsequent repair where the corrected standard deviation 
is added in brackets, respectively. 


MM\Model Size 1000 3000 5000 8000 10 000 

Bugzilla 0.05 0.1 0.1 0.1 0.08 (0.006) + 0.04 (0.01) 
Car Rental 0.27 5 17.9 72.3 65.5 (7.2) + 78.1 (4) 
Class Model 0.16 17 94 615 13.2 (14.2) + 85 (113.8) 
CoreWarehouse 0.81 4.5 18.9 67.9 0.4 (0.02) + 131 (10.9) 
GraphML 04 26 16.7 79.2 39.3 (56) + 168.1 (119.6) 
Latex 127 13 13 15 0.7 (0.01) + 0.8 (0.03) 
StateChart 055 17 55 187 35.8 (3.9) + 1 (0.3) 
Web Model 016 14 51 146 18.7 (18.8) + 6.2 (26) 


Table 4. Average run-time and standard deviation (in minutes) for generating valid 
EMF models of varying huge sizes for the GraphML meta-model using Strategy (3). 
The standard deviations are presented in brackets. 


Model Size 200 000 300 000 400000 half a million 
Average Time (Min.) 6(1.4) 11.4 (2.6) 23.3 (5.7) 32.5 (6.5) 


Experiment 2. The second experiment is dedicated to generating huge models for 
a complex meta-model which would lead to complex model repair processes. The 
meta-model GraphML is right for this purpose as its number of lower bounds 
being non-zero is above the average. Fulfilling these bounds renders model repair 
into a complex process. We expect the generation of models to become faster 
when using Strategy (3), i.e., when specifying a minimal number of edge occur- 
rences of a certain type. In this case, nodes are introduced together with incident 
edges; this generation behavior should reduce the number of repairs needed to 
take place for fixing lower bound violations. Models of an average size of between 
200 000 and 500 000 elements are generated in 6 to 32.5 minutes on average. Each 
generation process was repeated five times. The standard deviation was between 
1.4 to 6.5 minutes, i.e., the run-times for the generation of these huge models 
are pretty stable. Table 4 presents the experiment results. Moreover, to give an 
impression of the tool performance for simple meta-models, we applied it to the 
Bugzilla meta-model. It is considered as simple since it consists of unrestricted 
containment edges only. The tool needed 1.2 minutes only to generate a valid 
EMF model with a minimum of 500000 elements. 


6.2 Diversity Experiment 


To test if the parametrization of our algorithm has some effect on the diver- 
sity of generated models, we conducted the following experiment. We took the 
GraphML meta-model and chose Strategy (1) to randomly create 10 instance 
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Table 5. Diversity of randomly generated instance models parametrized by node types 
of the GraphML meta-model (EL = Element, K = Key, etc.; compare Fig. 1) 


Str. 1) | Str. 2) 
Specified Type All | EL. K. Ġ& E. HE. N. P. E.P. D. 


Shannon Index 3 | 2.12 0.82 0.76 0.94 0.92 0.99 1.57 1.48 2.06 


models containing about 2000 elements. For each node type as parameter, we 
created 10 instance models containing about 2000 elements according to Strat- 
egy (2) which specifies that this node type has to occur at least 500 times. For 
each of the resulting sets of model instances we calculated the Shannon index [33], 
aan W lg %, an established diversity measure. Here, N is the total number 
of nodes in the given set, į ranges over the 9 non-abstract node types in the 
GraphML meta-model, and n; is the number of nodes of that type in the given 
set. The resulting indices are presented in Table 5. Considering Strategy (1), the 
types of occurring elements show nearly uniform distribution as the maximal 
possible Shannon index is lg9 ~ 3.17. The indices for Strategy (2) show that 
the distribution of elements significantly differs, depending on the selected node 
type. 

To assess that even the sets with similar Shannon indexes differ from one 
another, we checked for the types actually occurring in each set and compared 
them. The results are depicted in Fig. 8. For example, 66% of the nodes are of 
type HyperEdge if HyperEdge (H.E.) is chosen as type parameter, and 68% of the 
nodes are of type Edge if Edge (E.) is chosen as parameter, even though both sets 
of models exhibit almost the same Shannon index. 
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Fig. 8. Relative number of occurrences (x-axis) of node types (y-axis) in all the instance 
models generated using Strategy (2); results obtained for different parameter settings 
are encoded in colors and each color indicates one instance model. For example, 79.26% 
nodes of type Graph and 20.74% nodes of type Node are created in an instance model 
for parameter Graph (G.). 
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To answer RQ 2, choosing different node types as parameter leads to signif- 
icantly different distributions of the node types of occurring elements. Hence, 
Strategy (2) can be used to introduce a certain diversity. 


6.3 Threats to Validity 


In our evaluation, we selected 8 meta-models. Evaluation results might differ 
when choosing others. We are confident, however, that our results are represen- 
tative as we selected meta-models from diverse backgrounds, with reasonable 
sizes, and with varying numbers and forms of multiplicities. The used metric 
to measure diversity completely abstracts from details of the underlying graph 
structures of generated instance models. On the one hand, abstracting from such 
details typically underrates diversity rather than overrating it. On the other 
hand, we have to acknowledge that the form of diversity we show in our experi- 
ments is limited to the distribution of types. 


7 Conclusion and Future Work 


We developed a rule-based approach for generating valid models w.r.t. arbitrary 
multiplicities and EMF constraints. Since we use a rule-based approach, our 
generator is configurable to support user specifications and to allow user inter- 
action. Several parameterization strategies are presented to generate different 
sets of valid EMF models. Two Eclipse plug-ins have been developed: Meta2GR 
automatically translates the meta-model of a given DSML to an MTS and the 
EMF Model Generator uses the derived MTS to generate valid EMF models. 
We evaluated the scalability of our approach by generating large instances of 
several meta-models of different domains and showed that models with 10000 
elements can be generated in about a minute on average. Furthermore, our tool 
can generate valid EMF models of 500000 elements in less than 2 minutes for 
a meta-model with largely unrelated multiplicity constraints and in about 30 
minutes for a meta-model with closely interrelated ones. Moreover, we showed 
that a certain form of diversity between the generated models can be achieved 
by configuration. As future work, we intend to support meta-models with OCL 
constraints, at least partly: Integrating the constraints as application conditions 
into rules [17,24] is a promising basis to extend our approach in this direction. 
Besides, we want to support further configuration facilities which allow us to 
generate realistic models by leveraging a stochastic controller [38]. 
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Abstract. Family-based SPL model checking concerns the simultaneous 
verification of multiple product models, aiming to improve on enumera- 
tive product-based verification, by capitalising on the common features 
and behaviour of products in a software product line (SPL), typically 
modelled as a featured transition system (FTS). We propose efficient 
family-based SPL model checking of modal -calculus formulae on FTSs 
based on variability parity games, which extend parity games with con- 
ditional edges labelled with feature configurations, by reducing the SPL 
model checking problem for the modal -calculus on FTSs to the vari- 
ability parity game solving problem, based on an encoding of FTSs as 
variability parity games. We validate our contribution by experiments on 
SPL benchmark models, which demonstrate that a novel family-based 
algorithm to collectively solve variability parity games, using symbolic 
representations of the configuration sets, outperforms the product-based 
method of solving the standard parity games obtained by projection with 
classical algorithms. 


1 Introduction 


Software product line engineering (SPLE) is a software engineering method for 
cost-effective and time-efficient development of a family of software-intensive 
configurable systems, according to which individual products (system variants) 
can be distinguished by the features they provide, where a feature is typically 
understood as some user-aware (difference in) functionality [1,2]. The intrinsic 
variability of SPLs challenges formal methods and analysis tools, because the 
number of possible products may be exponential in the number of features and 
each product may moreover exhibit a large behavioural state space. 

The SPL model checking problem, first recognised in the seminal paper [3], 
generalises the classical model checking problem in the following way: given a 
formula, determine for each product whether it satisfies the formula (and, ideally, 
provide a counterexample for each product that does not satisfy the formula). A 
straightforward way to solve this problem is to provide a model for each product 
and apply classical model checking. This enumerative, product-based method has 
several drawbacks. Most importantly, the state-space explosion problem —typical 
of model checking- is amplified with the number of products, while products of a 
product line usually have a large amount of features and behaviour in common. 
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Therefore, Classen et al. have extended labelled transition systems (LTSs) 
with features to concisely describe and analyse the combined behaviour of a fam- 
ily of models [3-5]. Concretely, transitions in the resulting featured transition 
systems (FTSs) are labelled with actions and feature expressions. Given a prod- 
uct, a transition can be executed if the product fulfills the feature expression. 
Hence, an FTS incorporates all eligible product behaviour, and each individual 
product’s behaviour can be obtained as an LTS. Moreover, FTSs cater for the si- 
multaneous verification of multiple products, known as family-based analysis [6]. 

Properties of behavioural models for SPLs such as FTSs can be verified with 
dedicated SPL model checkers like SNIP [7], ProVeLines [8], VMC [9], Pro- 
Feat [10,11], or QFLan [12,13], or with classical model checkers like NuSMV [14, 
15], SPIN [16], Maude [17], or mCRL2 [18,19]. The advantage of using estab- 
lished off-the-shelf model checkers for SPL analysis is obvious: it lifts the burden 
of maintaining dedicated model checkers in favour of highly optimised tools with 
a broad user base. In [19], it was shown how to perform family-based SPL model 
checking with mCRL2 [20,21] of properties of FTSs expressed in a feature- 
oriented variant of the modal p-calculus to deal with transitions labelled with 
feature expressions [22]. However, this approach is based on a decision procedure 
for the binary partitioning of the product space into products that do and those 
that do not satisfy a given formula, and it is underlined that computing suitable 
partitionings for the conducted experiments is a largely manual activity. 

In this paper, we present efficient family-based SPL model checking of modal 
p-calculus formulae on FTSs based on parity games with variability. Years after 
its introduction [3,14], family-based model checking of SPLs or program fam- 
ilies is still a popular topic [10, 16, 19, 23-26], including a few game-theoretic 
approaches based on solving (3-valued) model checking games on featured sym- 
bolic automata and on modal transition systems. A parity game is a 2-player 
turn-based graph game. It is well known that the model checking problem for 
modal pi-calculus formulae on LTSs is equivalent to parity game solving, for which 
Zielonka defined a recursive algorithm that performs well in practice [27-29]. 

Here we introduce variability parity games as a generalisation of parity games 
with conditional edges labelled with feature configurations. We then show how 
the SPL model checking problem for modal u-calculus formulae on FTSs can be 
reduced to the variability parity game solving problem based on an encoding of 
FTSs as variability parity games. Finally, we show the results of implementing 
two different methods, product-based and family-based, to solve variability par- 
ity games and of experimenting with them on two well-known SPL case studies, 
the minepump and the elevator. The product-based method simply projects a 
variability parity game to the different configurations and independently solves 
all resulting parity games with existing algorithms. The family-based method, in- 
stead, is based on a novel algorithm to collectively solve variability parity games, 
using symbolic representations of sets of configurations. The experiments clearly 
show that the family-based method outperforms the product-based method. 

Outline. After defining some preliminary notions in Section 2, we introduce 
SPL model checking in Section 3. In Section 4, we introduce variability parity 
games and show how they can be used to solve the SPL model checking problem. 
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In Section 5, we present a family-based, collective strategy for recursively solving 
variability parity games, which we experiment with on two SPL case studies in 
Section 6. Section 7 concludes the paper and provides directions for future work. 
Relevant related work other than the above is mentioned throughout the paper. 


2 Preliminaries 


We give a brief overview of labelled transition systems and the modal u-calculus. 


Definition 1. A labelled transition system or LTS L over a non-empty set of 
actions Act is a triple L = (S,—, so), where S is the set of states with so € S 
and — C S x Act x S is the transition relation. 


The modal p-calculus is an expressive logic, subsuming LTL and CTL, for rea- 
soning about the behaviours of LTSs, among others. 


Definition 2. Formulae in the modal -calculus are given by the following (min- 
imal) grammar. 


$= true | false | X|A6| V6 | (a)d | [alo | nX.4 | vX.0 


where a € Act is an action and X € X is some propositional variable taken from 
a sufficiently large set of variables X. 


Next to the Boolean constants and the propositional connectives, the modal u- 
calculus contains the existential diamond operator ( ) and its dual universal box 
operator | ] of modal logic as well as the least and greatest fixed point operators 
u and v that provide recursion used for ‘finite’ and ‘infinite’ looping, respectively. 

Given a formula ¢, an occurrence of a variable X in ¢ is said to be bound 
iff this occurrence is within a formula Y, where wX.w or vX.wv is a subformula 
of ġ; an occurrence of a variable is free otherwise. A formula ¢ is closed iff all 
variables occurring in ¢ are bound; here we only consider closed formulae. For 
simplicity, we assume that the formulae that we consider are well-named, i.e., 
formulae do not contain two fixed point subformulae binding the same variable. 

Given an LTS, the semantics of a p-calculus formula is the set of states of 
the LTS that satisfy the formula. Since we focus on games in this paper, we 
introduce two auxiliary concepts, viz. the Fischer-Ladner closure of a formula 
and the alternation depth of a formula. The Fischer-Ladner closure F'L(¢) of a 
formula ¢ is the smallest set of formulae satisfying 


— $ E FL(9); 

if ġ1 A ¢2 € FL(¢) or $1 VQ € FL(¢) then @1, ¢2 € FL(¢); 
if (a)o1 E€ FL(@) or (alo, € FL(¢) then ġı € FL(¢); 

— if oX.ġı E€ FL(¢) then ¢1[X :=oX.¢,] E€ FL(¢). 


Note that for a closed formula ¢, the set F'L(¢) contains no variables. 

The complexity of a -calculus formula is given by its alternation depth; the 
larger the alternation depth, the harder the formula is to solve (and, incidentally, 
also to understand). The alternation depth of a formula ¢ is defined as the largest 
alternation depth of the bound propositional variables in ¢, defined as follows. 
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Definition 3. The dependency order on bound variables of a formula ¢ is the 
smallest partial order <4 satisfying X<Y if X occurs free in oY.. The alter- 
nation depth of a p-variable X in ġ, denoted AD4(X), is the maximal length of a 
chain Xı < +: <¢ Xn, where Xı =X, variables X1, X3,... are -variables and 
Xə, X4,... arev-variables. Analogously for the alternation depth of a v-variable. 


Definition 4. A parity game is a tuple G = (V, E, p, (Vo, V1)) where 


— V is a finite set of vertices, partitioned into a set Vo of vertices owned by 
player 0 and a set Vı of vertices owned by player 1; 

— ECV xV is the edge relation; 

— p: V >N is the priority function. 


We depict parity games as graphs in which diamond-shaped vertices represent 
vertices owned by player 0 and box-shaped vertices represent vertices owned by 
player 1. Edges are annotated with configurations while priorities are typically 
written inside vertices. 

We write v > w instead of (v, w) € E and let a range over the set of players, 
i.e. a € {0,1}. For a given vertex v, we write vE to denote the set {w € V | 
v— wh} of successors of v. Likewise, Ev denotes the set {w € V | wv} of 
predecessors of v. A sequence of vertices vı ---U, is a path if for alll < m< n 
we have Um+1 E€ Um E. Infinite paths are defined in a similar way. We write mpn to 
denote the n-th vertex in a path 7 and mS” to indicate the prefix 71 --- 7p of 7. 

A play, starting in a vertex v € V, starts by placing a token on that vertex. 
Players then move the token according to a single simple rule: if a token is on 
a vertex u € V, and uE # 9, player a pushes it to some successor vertex w € 
uE. The finite and infinite paths thus constructed are referred to as plays. For 
an infinite play, and the infinite sequence of priorities it induces, the parity of 
the highest priority that occurs infinitely often on that play defines its winner: 
player 0 wins if this priority is even; player 1 wins otherwise. A finite play is won 
by the player that does not own the vertex on which the token is stuck. 

The moves of players 0 and 1 are determined by their respective strategies. 
Informally, a strategy for a player a determines, for a vertex m; € Va the next 
vertex 741 that will be visited if a token is on 7;, provided 7; has successors. 
In general, a strategy is a partial function øo : V*V, — V which, for a given 
history of vertices of the locations of the token and a vertex on which the token 
currently resides, determines the next vertex by selecting an edge to that vertex. 
A finite or infinite path m conforms to a given strategy o if for all prefixes mS’ 
for which ø is defined, we have 741 = o(7S"). 

A strategy o for player a is winning from a vertex v iff a is the winner 
of every play starting in v that conforms to o. Parity games are known to be 
positionally determined [30]. This means that a vertex is won by player a iff a 
has a winning strategy that does not depend on the history of vertices visited by 
the token. Such strategies can be represented by partial functions o : Vy > V. 
Note that every vertex in a parity game is won by one of the two players. 


Closed modal -calculus formulae can be interpreted by associating a game 
semantics to these formulae. The definition we provide below is adopted from [30]. 
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Table 1. The game semantics for a closed modal p-calculus formula ¢: vertex v (1st 
column), its owner a (2nd column), its successors (if any) w € vE (8rd column), 
and priority p(v) (4th column). Vertices of the form (s, (a)%) and (s, [a]y) have no 
successors when s has no a-successors. 


Vertex Owner Successor(s) Priority 

(s, true) 1 0 

(s, false) 0 0 

(s, %1 A Y2) 1 (s, %1) and (s, %2) 0 

(s, %1 V Y2) 0 (s, %1) and (s, %2) 0 

(s, [a]~) a (t, p) for every s > t 0 

(s, (a)y) 0 (t, p) for every s > t 0 

(s, vX.) il (s, Y[X := vX .4]) 2[AD4(X)/2| 

(s, uX) 1 (s, YX = X.Y) 2[AD$(X)/2] +1 


Definition 5. Let L = (S, —, so) be an LTS and ¢ be a closed modal -calculus 
formula. A state s E€ S satisfies formula ¢, denoted by L, s = ¢, iff vertex (s, $) 
is won by player 0 in the game G4 = (V, E, p, (Vo, V1)), where V = S x FL(¢), 
and the sets E, Vo, and Vı and priority function p are given by Table 1. 


If the context is such that no confusion can arise, we write s = ¢ for L, s = ¢. 

For a more in-depth treatment of the modal p-calculus, we refer to [30]. 
Here, we finish by illustrating the game semantics on a small example, drawing 
inspiration from an example in [19]. 


Example 1. Consider the LTS L depicted in the bottom-left corner of Fig. 1, 
modelling a coffee machine that after inserting one or two units of some currency 
(indicated by action ins) can dispense a standard regular coffee (indicated by 
action std) or an extra large coffee (indicted by action xxl), respectively. 

The LTL-type formula ¢, depicted in the top-left corner of Fig. 1, asserts that 
on all infinite runs of the coffee machine, it infinitely often dispenses a regular 
coffee. (Note, nothing is required to hold on finite runs.) The parity game that 
can answer whether so | @ holds is depicted on the right in Fig. 1. Each node 
is annotated with a pair consisting of a state of the LTS and a (sub)formula 
of @. Note that the references to $1, 2, and ¢3 are meant as an indication and 
not to be interpreted exactly, since they lack the substitution that needs to be 
carried out. We remark that the parity game is solitair: only one player can make 
decisions. Vertex (so, @) is won by player 1 by enforcing a 1-dominated infinite 
play, bypassing the vertex with priority 2 on the loop. Consequently, so  ¢. 


3 Software Product Lines Model Checking 


Software products with variability can be modelled effectively using so-called 
featured transition systems or FTSs [3]. Fix a finite non-empty set F of features, 
with f as typical element. Let B[¥] denote the set of Boolean expressions over F. 
Elements x and y of B[¥] are referred to as feature expressions. A product P is 
a set of features, P denotes the set of products, thus P C 27. 
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a S (so, [std])| 0 0] (so, [xxl]¢1) 
2 
— 
vX. nY. ([ins]Y A [xxl]Y ^[std]X ) (so, $1) (so, [ins] ¢1) 
a > > 0 > 0 > 0 
4n F (8o82) (80, 6s) 
1 
Y 
2 — (si, 1) 
(so, $) 
Y 
L std 0 |(s2, [xxl]o1) 0 k— 0 |(s1, $2) 
kits i (s1, [std] ¢) 
(60) = >(s1) O) (s2162) (82:61) \ 
ee (s2, ¢3)| 0 < OK 1K OK 0 |(s1, $3) 
xxl (s1, lins]ġ1) 
Y Y Y 
(s2, [ins]ġ1)| 0 0 |(s2, [std]ġ) (s1, [xxl] ¢1)] 0 


Fig. 1. Parity game encoding the model checking problem so |= ¢ 


A feature expression y, as Boolean expression over F, can be interpreted as 
a set of products P}, viz. all products P for which the induced truth assignment 
(true for f € P, false for f ¢ P) validates y. Reversely, for each family P C P we 
fix a feature expression yp to represent it. The constant T denotes the feature 
expression that is always true. We now recall FTSs from [4] as a model for 
software product lines, using the notation of [19,22]. 


Definition 6. An FTS F over Act and F is a triple F = (S,0, so), where S 
is the set of states with so E€ S and 0 : S x Act x S > B[F] is the transition 
constraint function. 


For states s,t € S, we write s ans, tif 6(s,a,t) = y and y Æ L. The projection 


of F onto a product P € P is the LTS F|P = (S, + Fp, so) over Act with 
a, F sys aly 
s —>p|p t iff P € P, for a transition s —> r t of F. 


Example 2. Assume that the coffee machine from Example 1 is to model a family 
of coffee machines for different countries, depending on whether a coffee machine 
accepts the insertion of dollars or euros, or both. Let P be a product line of coffee 
machines, with the independent features $ and €, representing the presence of a 
coin slot accepting dollars or euros, respectively, leading to a set of four products: 
{Ø, {$}, {€}, {$,€}}. The FTS F below models the family behaviour of P. 


std|€ F|Pi ee sù 
xxl| T 


The idea is that extra large coffee is exclusively available for 2 dollars, whereas 
1 euro or dollar suffices for a standard regular coffee. The behaviour of products 
Pı = {$} and Py = {€} is modelled by the LTSs FP; and F|P2 depicted above. 
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Note that coffee machine F'|P; accepting only dollars lacks the transition from 
sı to so requiring feature €, while coffee machine F|P, accepting only euros lacks 
the one from sı to s2 requiring feature $. The behaviour of product P3 = {$, €} 
is modelled by the LTS L = F|{$,€} depicted in Fig. 1. Finally, the product 
without any features is not depicted, but it deadlocks at state sı. 


Definition 7. The SPL model checking problem is to compute, for a given 
FTS F=(S,0, 80) and closed modal -calculus formula ġ, the largest subsets P+ 
and P~ of P such that F|P, so = ¢ for all PE Pt and F|P, so K ¢ for all P € P7. 


Sets Pt and P` partition P: a formula either does or does not hold in a state. 


Example 8. It is not difficult to see that the formula ¢ from Example 1 does 
not hold for all products. In fact, Pt = {@, {€}} and P~ = {{$}, {$,€}}. For 
products with feature $, there is an infinite run that avoids action std altogether, 
whereas for products not containing feature $, either all runs are finite, or all 
infinite runs contain an infinite number of std actions. 


4 Variability Parity Games and SPL Model Checking 


In practice, the model checking problem for LTSs, yielding a yes/no answer, 
can efficiently be decided using parity game solving algorithms [27,30]. The SPL 
model checking problem can be solved in a similar fashion by constructing parity 
games associated with the formula and with each individual product separately. 
Such an approach, however, does not take full advantage of the efficient, compact 
representation of the variation points in the individual product LTSs represented 
by an FTS. The variability parity games we introduce in Section 4.1, exploit 
constructs similar to those in FTSs to compactly encode variation points in 
the parity games they represent. We show in Section 4.2 that the SPL model 
checking problem can be solved by solving such variability parity games. 


4.1 Variability Parity Games 


A variability parity game is a generalisation of a parity game. It is a two-player 
game, again played by players odd, denoted by 1, and even, denoted by 0, on a 
finite directed graph. Contrary to parity games, an edge in a variability parity 
game is associated with a set of configurations. 


Definition 8 (Variability Parity Game). A variability parity game G is a 
sextuple G = (V, E, €, p, 0, (Vo, V1)), where 


— V is a finite set of vertices, partitioned into sets Vo and Vı of vertices owned 
by player 0 and player 1, respectively; 

— ECV xV is the edge relation; 

— Cis a finite set of configurations; 

— p: V SN is the priority function that assigns priorities to vertices; 

— 0: E — 2€ \ {Ø} is the configuration mapping. 
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In line with our depiction of parity games, we visualise variability parity games 
as graphs with diamond-shaped and box-shaped vertices, and directed edges 
connecting vertices. Moreover, edges are annotated with configurations. A vari- 
ability parity game G = (V, E, €, p, 0, (Vo, Vi)) is called total if, for all u € V, it 
holds that U{ @(u, v) | v E€ V, (u,v) E E} =€. 

As before, we write v —> w for (v,w) € E, and we use a to range over {0,1}. 
We use v & w to denote v —> w and c € (v, w) and say that the edge between v 
and w is compatible with c. The notions of a finite and infinite path from parity 
games carry over to variability parity games, and we use similar notation to 
denote the prefixes of a path and the vertices along a path. A finite path v1 --- Un 
is admitted for a configuration c € € iff for all m < n, c E€ O(Um,Um4i). In a 
similar vein, an infinite path can be said to be admitted for a given configuration. 

A play starts by placing a configured token c € € on vertex v € V. The 
players move configured token c in the game according to the following rule: if 
token c € € is on some vertex v € Vy, player œ pushes c, if possible, to some 
adjacent vertex w along an edge compatible with c, i.e. c E€ 0(v,w). The finite 
and infinite paths thus constructed are admitted by c, and are again referred to 
as plays; the conditions for players 0 and 1 for winning such plays are identical 
to those for parity games. 

For a configuration c € €, a strategy is a partial function ce : V* Va > V 
which, when defined for 7S’, yields a vertex 741 that is reachable from 7; via an 
edge that is compatible with c. A path 7, admitted by configuration c, conforms 
to a given strategy ce iff for all prefixes mS$ for which ø is defined, we have 
Ti+1 = 0-(7<"). Strategy ce for player a and configuration c is winning from a 
vertex v iff a is the winner of every play starting in v that conforms to ce. 


Definition 9. The variability parity game solving problem for a vertex v is the 
problem of computing the largest set of configurations Co, Cı C € such that: 


— player 0 has a winning strategy for v for each c € Co; 
— player 1 has a winning strategy for v for each c € Cy. 


For a given variability parity game G and a configuration c € €, we define the 
projection of G onto c, denoted G|c as the parity game obtained by retaining only 
those edges from G that are compatible with c. We note that it follows rather im- 
mediately that variability parity games are also positionally determined: player 0 
(player 1, respectively) has a winning strategy o, for vertex v for configuration c 
iff she has a winning strategy for v in the projection of the variability parity 
game onto configuration c. Since parity games are positionally determined, so 
are variability parity games. Consequently, the variability parity game solving 
problem asks for the computation of a partition of the set of configurations €. 


4.2 Solving SPL Model Checking Using Variability Parity Games 


If we ignore the representation of the sets of configurations decorating the edges, 
a variability parity game is a compact representation of a set of parity games. The 
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Table 2. Transformation of the SPL model checking problem to the variability parity 
game solving problem. For a given vertex v (1st column), its owner a (2nd column), 
successors w € vE (3rd column) and configuration mapping 6(v, w) (3rd column), and 
priority p(v) (4th column) are given. 


Vertex Owner Successor(s) | Configurations Priority 

(s, true) 1 0 

(s, false) 0 0 

(s,%1 A 2) 1 (s, Y1) | P and |P 0 

(s, %1 V Y2) 0 (s,¥1) | P and (s, 2) | P 0 

(s, a 1 (t, p) | Py for every s AN a t 0 

(s, (a) 0 (t, p) | Py prere A a 0 

(s, vX. w) 1 (s, V[X :=vX.4]) | P 2|ADg(X)/2] 

(s, uX) 1 (8s, PLX = uX 4) | P 2[ADg(X)/2] +1 


next definition shows how to exploit these configurations to efficiently encode the 
SPL model checking problem as a variability parity game solving problem, based 
on the game-based semantics of the modal -calculus we presented in Section 2. 


Definition 10. Let F = (S, 0F, so) be an FTS, let P be the set of all products, 
and let @ be a closed modal -calculus formula. The variability parity game Fy = 
(V, E, €, p, 0, (Vo, V1)) associated with F and ġ, with V = S x FL(¢) and € =P, 
is defined by the rules given in Table 2. 


Note that the size of the graph underlying variability parity game Fy, measured 
in terms of |V| + |E], is linear in the size of formula ¢ and the FTS F, measured 
in terms of |S|+|{(s,a,t) E€ S x Act x S | 0(s,a,t) # L}|. Hence, the structural 
information in an FTS is compactly reflected in the variability parity game which 
encodes the SPL model checking problem for the FTS. The correctness of the 
encoding is expressed by the Theorem 1. 


Theorem 1. For a given FTS F, a closed modal -calculus formula ¢, and a 
product P, we have F|P,s — ¢ iff player O wins the vertex (s,@) for configura- 
tion P in the variability parity game Fy associated to F and ¢. 


Proof (sketch). Fix an FTS F and a closed modal pi-calculus formula ¢. Let P 
be a product. It is not hard to show that the parity game we obtain by encoding 
the model checking problem F|P,s = ¢ (cf. Definition 5) is isomorphic to the 
projection of Fy onto P, viz. Fy|P. 


We revisit the SPL model checking problem of Example 3, illustrating the encod- 
ing of Definition 10. By abuse of notation, we write feature expressions instead 
of sets of configurations in variability parity games associated to SPL model 
checking problems. 


Example 4. Consider the FTS F of Example 2 and the modal p-calculus for- 
mula ¢ of Example 1, both for convenience repeated in Fig. 2. The variability 
parity game Fy encoding the SPL model checking problem for F and ¢ is de- 
picted on the right in Fig. 2 (ignoring all dashed self loops for now). We omitted 
most state annotations to yield a more readable figure. 
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Fig. 2. Variability parity game encoding the SPL model checking problem for F and ¢. 


Observe that the graph structure of the variability parity game F% is the same 
as that of the parity game of Example 1 in Fig. 1. The construction leading to 
the variability parity game only differs in the construction of the parity game 
with respect to the edge annotations. Furthermore, note that vertex (so, @) is 
won by player 0 for the set of configurations ~$, whereas player 1 wins the 
set of configurations $: for configurations containing the feature $, player 1 can 
essentially reuse the strategy of Example 1, avoiding the vertex with priority 2. 
For configurations not containing the feature $, this option is not available, since 
the vertex (s1, [ins]¢,) is a sink. For products with feature € but not $, the only 
infinite play infinitely often visits vertex (so,). For products without features 
€ and $ all plays starting in (so, ) are finite. Hence, by Theorem 1, the solution 
to the SPL model checking problem is the pair (~$, $), as expected. 


5 Recursively Solving Variability Parity Games 


Given a variability parity game G and a vertex v of G, a straightforward way 
of solving the variability parity game problem for v is by simply solving the 
standard parity game problem G|c for every c € €. In doing so, however, we ignore 
that players can potentially use (parts of) a single strategy for possibly many 
different configurations. As opposed to the above solving strategy, to which we 
refer as the individual solving strategy, we investigate an alternative for variability 
parity games, called the collective solving strategy. 

We provide an algorithm, Algorithm 1, for solving variability parity games 
inspired by the classical recursive algorithm for solving parity games [27]. The 
recursive algorithm is, despite its unappealing theoretical worst-case complexity, 
in practice one of the most effective algorithms for solving parity games [28, 29]. 
It is a divide-and-conquer algorithm that relies on two building blocks, viz. the 
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concept of a subgame computation and of an attractor computation. We generalise 
and adapt these concepts to the setting of variability parity games. 

Fix a variability parity game G = (V, E, €, p, 0, (Vo, V1)). For simplicity we 
assume that G is total. This is not a limitation; any variability parity game can 
be turned into a total one. The auxiliary notion of a restriction is a mapping 
o : V — 2° which, for a variability parity game G, indicates which configurations 
are under consideration for a vertex. Given such a restriction 9, we say that a 
vertex v for configuration c € € is won by player a in the game G restricted to o 
iff c € e(v) and the winning strategy for a only passes through vertices v’ for 
which c € o(v’). We say that G is total with respect to o iff for all v € V and all 
c € g(v), there is a vertex w such that w € vE and c € 0(v, w) N o(w). 

Let U,U' : V — 2° be arbitrary mappings. The union of U and U’, denoted 
U UU’, is defined point-wise, i.e. (U U U')(v) = U(v) UU(v’). We say that 
mapping U is a sub-mapping of o iff for all v € V we have U(v) C o(v). The 
reduction of ọ with respect to a sub-mapping U, denoted ọ\U, is a new restriction 
defined as (o\U)(v) = o(v)\U (v). 

For a given sub-mapping U : V — 2° of a restriction 0, the a-attractor 
towards U is a sub-mapping of @ which assigns those configurations to a vertex 
for which player œ can force the play to reach some vertex v for which that 
configuration belongs to U (v). Formally, we define Attr,(U), in the context of o 
and G, as Attr,(U)(v) = U;>o Attri (U)(v), where 


Attre(U)(v) = U(v) 
Attrét!(U)(v) = Attri (U) (v) U 
{c€ o(v) |v E€ Va A aw EvE: cebu, w) N o(w) N Attri (U)(w)} U 
{c€ o(v) |v E€ Va AYw E€ vE: c E€ (€\(A(v, w) N e(w))) U Attr? (U)(w) } 


Thus, in case v € Va and c€ o(v), configuration c is in Attrét'(U)(v) if for a 
move by player a to some vertex w allowed for configuration c, the sub-attractor 
Attri (U)(w) can be reached. In case v € Va and c€ o(v), configuration c is in 
Attr’*!(U)(v) if all moves for player @ are not allowed for configuration c or lead 


to a vertex w in the sub-attractor Attr’(U)(w) for player a for A. 


Example 5. Reconsider the variability parity game of Example 4. First, observe 
that it is not total. In this case, the variability parity game can be made total 
without changing the solution by taking into account also the dashed self loops. 

Let o(v) = € and define U(so,¢) = € and U(v) = @ for all v # (so, ¢). 
For vertex (so, ) we have Attro(U) (so, ¢) = {2, {$}, {€}, {$, €}}. All vertices v 
on the (single) path starting in (so, [ins]@1) and ending in (s1, [std]@) satisfy 
Attro(U)(v') = {{$,€}}. The remaining vertices v’ satisfy Attro(U)(v’) = Ø. 
Note that for no configuration the immediate predecessor of (sọ, [ins]#1) is at- 
tracted to U because of the escape to the sink that player 1 can use. 


We have the following result, which can be proven by induction on 7 following 
the definition of Attr,(U)(v) =Uiso Attra(U)(v). 
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Algorithm 1 Recursive Algorithm for a fixed variability parity game G = 
(V, E,€,p,0, (Vo, Vi)). Given a restriction ọ : V — 2°, the algorithm returns 
a pair of functions (Wọ, W1) where Wo, Wı : V — 2€ denote, for each vertex, 
which set of configurations is won by player 0 (player 1, respectively). 


1: function SOLVE(g@) 

2 if o = Av € V. then 

3 (Wo, W1) & (Av € V. Ø, Av € V. 0) 
4 else 

5: m + max{ p(v) |v EVA o(v) #0} 
6: am mod 2 

7 U & Ww E V. {o(v) | p(v) = m} 

8: A+ Attr,(U) 

9: (Wo, Wi) = Sotve(e\A) 

10: if Wa = Av € V.0 then 
11: Wate WLU A 
12: Wa + WE 
13: else 
14: B + Attrs(W£) 
15: (Wo', W1') = SoLve(o\ B) 
16: Wa + WE 
17: Wa + WË U B 
18: end if 
19: end if 


20: return (Wo, W1) 
21: end function 


Lemma 1. Let G = (V, E, €, p, 0, (Vo, V1)) be a variability parity game, let o : 
V — 2° a restriction, and let a be an arbitrary player. Then for all sub- 
mappings U of o, also Attr,(U) is a sub-mapping of o. 


Totality of a game is preserved for the complements of attractors of sub-mappings. 


Lemma 2. Let G = (V,E,€,p,6,(Vo,Vi)) be a variability parity game and let 
o: V — 2€ be a restriction such that G is total with respect to o. Then G is total 
with respect to o\Attr (U) for all sub-mappings U of o and each player a. 


Proof. Let G and o be as stated. Consider an arbitrary mapping U : V > 2°, and 
let A = Attr,(U) be the a-attractor towards U. By Lemma 1, A is a sub-mapping 
of o. Towards a contradiction, assume that G is not total with respect to @\A. 
Then there is some vertex v € V and some configuration c € (g\A)(v) such 
that for all w € vE, if c € 0(v, w) then c ¢ (@\A)(w). Pick such a vertex v and 
configuration c. Since G is total with respect to 0, we know that there is at least 
one w E€ vE with c € 0(v, w) and c € o(w). Let w € vE be such that c € 6(v, w) 
and c € e(w). It then follows that c ¢ (@\A)(w), and, hence, c € A(w). So, for 
all w € vE for which c € (v, w) and c € ọ(w) we have c € A(w). But then, by 
definition of a-attractor, also c € A(v). Contradiction, since c € (@\A)(v). 


We proceed with the following result regarding the propagation of winning with 
respect to a sub-mapping along an attractor. 
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Lemma 3. Let G = (V,E,€,p,6,(Vo,Vi)) be a variability parity game and let 
o: V - 2° be a restriction. Let a be an arbitrary player and suppose U is a 
sub-mapping of o. If for allu € V, player a wins vertex v for all configurations 
c E€ U(v), then a wins vertex v for all configurations c € Attr,(U)(v). 


Proof. Let o, a and U be as stated. We proceed by induction on i with respect 
to the definition of Attri (U). 

Base case (i = 0): Follows by assumption. Induction step (i > 0): Suppose 
player a wins vertex v for all configurations c € Attri (U)(v). Pick an arbitrary 
vertex v’ and configuration c’ € Attrit'(U)(v’). Since ce € Attrit'(U)(v’), we 
have œ € o(v). If d € Attri (U)(v'), the result follows instantly by induction. If 
c! ¢ Attri (U) (v'), then we distinguish two cases. 

Case vu’ € Va: Then there must be some w € v’E such that c’ € 0(v', w) and 
c € Attri (U)(w). Let w be such. Then player a can play a c’-configured token 
from v’ to w and, by induction, win vertex w for configuration c’. But then she 
also wins vertex v’ for configuration c’. 

Case v’ € Va. Then, for all w € v'E such that c € A(v',w), also d € 
Attri (U)(w). Since regardless of how player @ moves the c'-configured token 
from v’ along an edge admitting c’, she will end up in a vertex that, by induction, 
is won by a for configuration c’. 


The next theorem captures the correctness of Algorithm 1. 


Theorem 2. Let G = (V,E,€,p,6,(Vo,Vi)) be a variability parity game and 
let 0: V — 2° be a restriction such that G is total with respect to o. Then 
SOLVE(0) returns the mappings Wo, Wı : V —> 2€ such that for all v € V, 
Wo(v) UWi(v) = € and both for player 0 and 1, for each c E€ Wa(v), player a 
wins vertex v for configuration c. 


Proof. Fix a total variability parity game G = (V, E, €, p, 0, (Vo, Vi)). We prove a 
slightly stronger property, viz. for all restrictions o : V — 2° such that G is total 
with respect to 9, procedure SOLVE(g) returns mappings Wo, W1 : V — 2° that 
are sub-mappings of g such that for all v € V it holds that Wo(v) UWi(v) = o(v) 
and player œ wins vertex v for each configuration c € W,(v). Let us define 
lol = Xey lolv)|. The proof will proceed by induction on |o] and closely follows 
the standard proofs of correctness for parity games. 

Base case: We have o(v) = Ø for all v € V. Consequently, the algorithm 
returns the functions Wọ and W; satisfying Wo(v) = Wi(v) = Ø for all v € V. 
Trivially Wo and W; satisfy the statement. 

Induction step: Let o be a restriction such that G is total with respect to o. 
As our induction hypothesis, assume that the statement holds for all 0’ such that 
|o'| < Jo|. Let m be the maximal priority among those vertices in G for which o 
yields a non-empty set of configurations, and let œ be m mod 2. Let U be the 
sub-mapping of g for which U(v) = o(v) if p(v) = m, and U(v) = Ø otherwise, 
and let A be the sub-mapping Attr,(U). By Lemma 2, G is total with respect 
to o\A, and hence, by induction, the functions Wj, W{ returned by SOLVE(g\ A) 
satisfy the statement. Next, we distinguish two cases. 
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Case W£ (v) = 9 for all v. Then, by our induction hypothesis, player a wins all 
vertices v for configurations c € W/(v) in the game restricted to g\A. Regarding 
the remaining vertices, note that for vertices v € Va and configurations ce W! (v) 
with an edge to a vertex w with c€ A(w), player @ may escape to such vertices. 
However, then a can force the play to visit a vertex with priority m. Remaining 
in vertices with priority m means losing for &. Playing to any vertex other than 
those in U leads to a play that remains either in Wa or infinitely often revisits U. 
In either case, a wins such plays. For vertices v € Vy and configurations c€ e(v), 
player a either follows the winning strategy in W/, or the attractor strategy for A 
towards a vertex in U. Consequently, œ wins all vertices v for all configurations 
c€e(v), which is consistent with Wa and Wa as returned by SOLVE. 

Case WŁ (v) 4 Ú for some v. Since player & wins any vertex v for configuration 
c € W£(v) in the game restricted to @\A, and player a cannot force the play 
to a vertex w for which c € A(w), player @ also wins all such vertices and 
configurations in G restricted to ọ. By Lemma 3, @ thus also wins all vertices v 
for configurations c € B = Attr;(W£)(v). By Lemma 2, G is total with respect 
to g\B, and hence, by induction, the functions Wj’, Wj’! returned by the call 
SOLVE(@\B) satisfy the statement. It then follows that player a wins all vertices v 
for configurations c € W} (v) and player & wins all vertices v for configurations 
c € (Wa U B)(v) as set by SOLVE. 


Algorithm 1 requires that the attractor Attr,(U) for a sub-mapping U can be 
computed (cf. line 8 of the algorithm). To cater for this, the attractor com- 
putation for sub-mappings can be implemented following the pseudo-code of 
Algorithm 2, the correctness of which is claimed by Lemma 4. 


Lemma 4. For a restriction o : V — 2°, a sub-mapping U : V —> 2€ of o and 
a player a, ATTR(a,U) terminates and returns a sub-mapping A of o satisfying 
A= Atir,(U). 


Algorithm 2 is actually a straightforward implementation of the definition of 
the attractor set computation following the high-level structure of the attractor 
computation for standard parity games. We forego a detailed proof of Lemma 4, 
which, for soundness, uses an invariant stating that the computed sub-mapping 
A under-approximates Attr,(U) and for completeness uses an invariant that 
asserts for all configurations c€ Attr,(U)(v) either ce A(v) or there is a vertex 
v' € Q and attractor strategy underlying Attr,(U)(v) inducing a play for c, 
starting in v, visiting v’ and not visiting vertices v” with c€ A(v”) in between. 

Instead, we briefly explain the underlying intuition. It conducts a typical 
backwards reachability analysis, maintaining a queue Q of vertices that are at 
the frontier of the search for at least some configurations. For each vertex w in 
this frontier, its predecessors v € Ew are inspected in a for-loop. Either such a 
predecessor is owned by player a, in which case all configurations that can reach 
w in one step are added to the attractor set for v, or such a predecessor is owned 
by player @, in which case all v’s successors must be inspected, and only those 
configurations c of v for which all their successor options are to move to some 
vertex w’ already satisfying c € A(w’) are added to its attractor. 
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Algorithm 2 Attractor computation. Given a variability parity game G = 
(V, E,€,p,0,(Vo,Vi)), a restriction ọ : V —> 2° and a sub-mapping U of o, 
the algorithm computes the a-attractor towards U. 


1: function ATTR(a,U) 


2: Queue Q +} {vEV|U(v) #0} 

3: AtcU 

4: while Q is not empty do 

5: w + Q.pop() 

6: for every v € Ew such that o(v) N A(v, w) N A(w) # 0 do 
7 if v € Va then 

8 at olv) N (v, w) N A(w) 

9: else 

10: a + o(v) 

11: for w’ € vE such that o(v) N A(v, w’) N e(w') 40 do 
12: a + aN (€ \ (0(v, w’) N o(w')) U A(w')) 

13: end for 

14: end if 

15: if a\ A(v) 49 then 

16: A(v) + A(v)Ua 

17: if v ¢ Q then Q.push(v) 

18: end if 

19: end for 


20: end while 
21: return A 
22: end function 


6 Implementation and Experiments 


As an initial validation of our approach we experimented with two SPL examples, 
viz. the well-known minepump and elevator case studies first recognised as SPLs 
in [3,14], modelled for the mCRL2 toolset [20,21]. 

A prototype for solving variability parity games connecting to the mCRL2 
toolset was implemented in C++ using the BuDDy package [31,32] for BDD 
operations. The prototype uses BDDs to represent product families; parity games 
are represented as graphs with adjacency lists for incoming and outgoing edges. 
For the recursive algorithm, bit vectors are used to represent sets of vertices 
sorted by parity then by priority. All experiments were run on a standard Linux 
desktop with Intel i5-4570 3.20Hz processor and 8GB DDR3 internal memory.’ 


6.1 Minepump Case Study 


The minepump example of [33], in the SPL variant of [4], describes a configurable 
software system coordinating the sensors and actuators of a pump for mine 
drainage. The purpose of the system is to keep a mine shaft free from water. 


3 Solvers and experiments: https: //github.com/SjefvanLoo/VariabilityParityGames 
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A controller operates a pump that may not start nor continue running in the 
presence of dangerously high levels of methane gas. To this end, it needs to 
communicate with sensors that measure the water and methane levels. The SPL 
model has 11 features and 128 products; the resulting FTS consists of 582 states 
and 1376 transitions. The mCRL2 code of this model, developed for [19], closely 
follows the fPROMELA code of [4] (also used in [16]) that is distributed with [8]. 

We verified nine properties, p1 to y9, for the minepump case study, examined 
also elsewhere in the SPL literature (cf., e.g. [3, 4,7, 16,19, 24, 34-36]). These 
induce variability parity games consisting of approximately 3000 to 9200 vertices 
and 2 to 4 different priorities. Specifically, for properties y1, Y4, and y7, we used 
the following formulae, expressed in the mCRL2 variant of the modal p-calculus, 
which allows to mix fixed points, regular expressions, and first-order constructs. 


Property yı. Absence of deadlock: [true*] <true> true 


Property p4. The pump cannot be switched on infinitely often: 


( mu X. nu Y. ([pumpStart] [!pumpStop*] [pumpStop] X && 
[!pumpStart] Y )) && ( [true*] [pumpStart] mu Z. [!pumpStop] Z ) 


Property p7. The controller can always eventually receive/read a message, i.e. 
return to its initial state from any state: [true*] <true*> <receiveMsg> true 


While yw is a common LTL-type formula, 7 is typical for CTL. Table 3 provides 
the running times for verification of properties pı to yg via variability parity 
games, and the sizes of classes (P+, P~) partitioning P. The results show that 
the collective solving strategy for family-based SPL model checking outperforms 
the individual solving strategy for product-based SPL model checking. 

While a full baseline comparison with other SPL model checking algorithms 
was not performed, our approach promises to be at least as efficient as related 
approaches. This conjecture is based on the running times reported for properties 
Pı, Y4, and ye in [4, 16,19] (all verified with standard computers of that time). 


Table 3. Running times (in ms) for experiments for the product-based and family- 
based SPL model checking of the minepump and elevator case studies using recursive 
algorithm for variability parity games. 


Minepump SPL Elevator SPL 

Property | product | family |PT|/|P7| Property | product | family | |P™|/|P7| 
pı 28.88 3.92 128/0 wr 14335 5409 2/30 
p2 54.79 6.76 0/128 we 14988 5744 4/28 
p3 184.7 24.70 0/128 Y3 16045 5020 4/28 
pa 145.0 37.46 96/32 Wa 16865 5272 4/28 
5 144.5 12.19 96/32 Ws 8954 3013 16/16 
pe 242.9 42.79 112/16 pe 4252 772 32/0 
p7 134.3 11.71 128/0 wr 4171 765 32/0 
Ys 17.44 1.058 128/0 
P9 110.0 6.853 0/128 
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6.2 Elevator Case Study 


The other configurable system we considered is the elevator example of [37] of a 
lift travelling between five floors. A product in the elevator system may or may 
not provide the features of parking, load and overload detection, cancelling on 
emptiness, and priority for specific floors. Absence or presence of specific features 
in asystem configuration generally leads to different behaviour. The behaviour of 
the lift itself is governed by the so-called single button collective control strategy, 
deciding which floor is visited next. Roughly speaking, and dependent on the 
specific feature setting, the lift operates in sweeps, only changing direction if 
there are no outstanding calls in the current direction. The FTS implementation 
in mCRL2 underlying the experiments is derived from the 120 lines of SMV code 
presented in [37]. Although the number of features in this SPL example is small, 
viz. only 5 independent features resulting in 32 different configurations, the FTS 
consists of 95591 states and 622265 transitions. 

The seven properties, %ı to v7 for the elevator case study, also examined 
elsewhere in the literature (cf., e.g. [10-12, 14, 15, 25, 26, 35, 38]), which we ex- 
perimented with were adapted from [37]. These induce variability parity games 
consisting of approximately 440000 to 18500000 vertices with 2 to 3 different 
priorities. The properties cover a proper handling of requests, correct behaviour 
with respect to the control strategy, proper behaviour when idling, and the pos- 
sibility to stop at floors while passing. By way of illustration, properties Y2, ws, 
and ws are expressed as follows in the mCRL2 variant of the modal p-calculus. 


Property pə. Invariantly, if a lift button is pressed for a floor, the lift will even- 
tually open its doors on this floor: 


[true*] forall i:Floor. [liftButton(i)] 
( mu X. ( [!open(i)] X && <true> true ) ) 


Property w3. Invariantly, if the lift is travelling up while there are calls above 
the lift will not change direction: 


[true*] ( ([ direction(up) . 
(!(direction(down) || exists k:Floor. open(k)))* ] 
forall i:Floor. val(i <= i && i <= 5) => 
[ open(i) ] forall j:Floor. val(i < j && j <= 5) => 
[ 1iftButton(j) ] mu Y. ( [!open(j)] Y && 
[direction(down)] false && < true > true ) ) ) 


Property p5. Invariantly, if the lift is idling, it does not change floors: 


( forall i:Floor. val(i <= i && i <= 5) => 
<true*.idling(i)> true ) && 

( [true*] forall i:Floor. val(1 <= i && i <= 5) => 
[ idling(i) ] nu Y. <idling(i)> Y ) 
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It is noted, in particular with regard to property ws, that unlike the original SMV 
elevator system, our lift idles with its doors open, to prevent the situation where 
someone in the lift infinitely often presses the landing button for the current 
floor, keeping the process busy without the lift making any movement. 

Also in the case of the elevator system we notice a significant difference in 
performance when doing product-based model checking calling the individual 
solving strategy or family-based model checking calling the collective solving 
strategy. The difference is, however, not that striking compared to the minepump 
case study, which, we believe, is due to the small number of different features. 

As said, a full baseline comparison with other SPL model checking algorithms 
was not performed. For one, the efficiency of our approach with respect to related 
approaches is not easily measured with the elevator case study. While properties 
pa and ws were verified also in [14, 15,25, 26,35,38], not much can be concluded 
from the reported running times. First, our model’s mCRL2 code was developed 
from scratch, following the SMV code from [37], and not the fPROMELA code 
of [14, 15, 25, 26, 35,38]. Moreover, the number of floors in these models ranges 
from 4 to 6. In [10-12], finally, the models are probabilistic, the number of floors 
ranges from 2 to 40, and different (probabilistic) properties were verified. 


7 Conclusions 


We have introduced variability parity games as a generalisation of parity games, 
reflecting the generalisation by FTSs of LTSs, and have defined the SPL model 
checking problem of modal p-calculus formulae on FTSs as a variability parity 
game solving problem, for which we have provided a recursive algorithm based 
on a collective, family-based solving strategy. To illustrate the efficiency of the 
approach, we have applied it to two classical examples from the SPL literature, 
viz. the minepump and the elevator case studies. The experiments show that the 
collective, family-based strategy of solving variability parity games typically out- 
performs the individual, product-based strategy of solving the standard parity 
games obtained by projection from the variability parity games 

Further experiments are needed to measure and pinpoint the differences in 
efficiency. One direction for future work is to generate a sufficient number of 
random variability parity games to this aim. In particular, the configuration sets 
that label the edges of the variability parity games for the minepump and elevator 
case studies obey a very specific distribution, typically admitting either 100% 
or 50% of the configurations. It would be interesting to see how our approach 
behaves in case of SPLs with more complexly structured feature diagrams. 

There is a wealth of different algorithms available for parity games, of which 
the recursive algorithm that we have here lifted to variability parity games is one 
of the most competitive ones in practice. Nevertheless, we think it pays to study 
other algorithms and lift these to variability parity games, too. Finally, we believe 
that variability parity games have applications beyond SPL model checking; e.g. 
in (parameter) synthesis problems. We leave these topics for future research. 
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Abstract. This paper introduces a modelling environment for service design that 
currently supports 5 different notations (Business Model Canvas, e?value, Ser- 
vice Blueprint, Process Chain Network and BPMN). Besides, the tool supports 
the generation of partial views of models based on a particular notation from 
models made with another one, along with the corresponding relations model. 
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1 Motivation 


Born in the context of research on services marketing, service design evolved and 
gained impact thanks to the promotion of IDEO and it has been eventually established 
as the entry point to service development for any organization seriously concerned with 
user experience and digital transformation. For example, take a look at the British gov- 
ernment’s efforts in this regard, which have been materialized through the Govern- 
ment’s Digital Service initiative. 

Business modeling is essential in order to achieve a successful service design, since 
companies need to constantly redesign their business model [3] in their strive towards 
a successful servitization process. To that end it is key that all the departments of the 
organization share a clear vision and a common understanding of such models, even 
when the working languages are different, which, in the case of business models, im- 
plies using different notations [4]. Literature reveals indeed that there is a huge number 
of definitions of what a business model is since the concept has been historically con- 
sidered from three different perspectives: technology-oriented, strategy-oriented and 
organization-oriented [2]. Some authors even distinct four categories of business mod- 
elling, namely: business process models, business motivation models, business organi- 
zation models and business rules models [1]. 

Two of these business modeling disciplines, namely business organization and busi- 
ness process models, are at the core of service design. It implies indeed the use of busi- 
ness models more oriented towards providing a quick and strategic overview of the 
organization, such as the Business Model Canvas or the e*value model, and business 
models that are more oriented to show the details of a particular service offering, like 
Service Blueprints, Process Chain Networks or BPMN models. 
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Even though tool support is currently available for some of these techniques, there 
is no comprehensive solution that allows working with all of them. Therefore, different 
stand-alone tools must be used for each notation, like generic diagramming tools or 
web-based apps, such as MS-Visio or Lucidchart. Although these can be good options 
for quick sketching, such tools were not devised to enable later processing of the infor- 
mation gathered in such models [5]. 

So, provided that the only option to support some notations was to develop a new 
tool due to the lack of previous tool support (PCN) and the lack of model-based tool 
support for other notations (Canvas and e?value), a comprehensive toolkit was devel- 
oped in order to facilitate the building of technological bridges among notations as well 
as the implementation of post processing tasks, such as validation, autocorrection or 
model transformations. 

To address these problems this report introduces the last version of INNoVaServ!: 
a modelling toolkit that comprises a set of visual DSLs implementing different business 
modeling notations. Regarding previous versions, this one provides tool support for 
new notations (PCN and BPMN) in the shape of DSLs and it also bundles the tooling 
needed to register and manage the relationships among business models defined 
through such DSLs. To that end, the toolkit supports the generation of partial models 
from models expressed with another notation, along with traces or relations models 
collecting the relationships between the elements of the models involved in such trans- 
formation. In addition to that, INNoVaServ supports the formal validation of Service 
Blueprint and PCN models by means of formal techniques [6, 7] and puts together syn- 
tax and semantic checkers for each of the notations supported by the framework. 


2 Technological Solution 


This section first discusses the conceptual architecture of the modelling toolkit intro- 
duced in this work to later summarize its development process. 


2.1 Conceptual Architecture 


The conceptual architecture of INNoVaServ, which results in a high level of modulari- 
zation, is illustrated in Figure 1 and can be described according to two orthogonal di- 
mensions. 

On the one hand, INNoVaServ can be thought of as a set of five integrated DSLs, 
one for each business modelling notation supported by the tool. This way, in the hori- 
zontal dimension of Figure 1 five different modules corresponding to five different 
DSLs can be distinguished: Business Model Canvas, e*value, Service Blueprint, PCN 
and BPMN. 

On the other hand, the conceptual architecture of INNoVaServ leans on the func- 
tionality provided by EMF to follow the separation of concerns principle [9] by distin- 
guishing the presentation of each model from the model itself. This way, the presenta- 
tion tier includes the components needed to support the edition and representation of 
models whereas the models are managed by the logic tier. As right-hand side of Figure 
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1 shows, this distinction corresponds to the usual distinction between the concrete and 
the abstract syntax of any modelling language. 

Besides, a layer connecting the five DSLs is depicted at lower part of Figure 1. Fol- 
lowing the idea suggested in [10] to refer to all the tasks related with model manage- 
ment, this layer is called model processor and serves as container for different compo- 
nents supporting different model management tasks to be bundled in the tool (valida- 
tion, weaving, transformation, etc.). 


Canvas DSL evalue DSL Service Blueprint DSL PCN DSL BPMN DSL 


Canvas e*value Service Blueprint PCN BPMN 


Ul Ul Ul Ul ul ONCRETE SYNTAX 


Canvas evalue 
Model Model 
z% 


Presentation 


Service Blueprint PCN BPMN 
Model ] [ Modei ] [ Model | ABSTRACT SYNTAX 
: = = 


Logic 


MODEL PROCESSOR f 


Figure 1. INNoVaServ conceptual architecture. 


2.2 Development Process 


Initially, each of the DSLs bundled in INNoVaServ were built atop of Eclipse 
EMF/GMF according to the guidelines sketched in [5] for the development of model- 
based tools that take the shape of DSL toolkits. 

However, due to the recent lack of GMF support, it was necessary the migration of 
the DSLs from GMF to Sirius. Sirius is also based on Eclipse EMF, and the develop- 
ment process of editors for graphical DSLs with Sirius is still similar to that of GMF: 
specification of the metamodel; definition of the concrete syntax (Sirius allows to see 
real time results, besides easily creating different viewpoints for the same abstract syn- 
tax); identification of the relationships between the models collecting the definition of 
the abstract and the concrete syntax; creation of the tool palette and finally, manual 
refinement (if needed) of the generated code. 

A series of additional functionalities have been also added to the graphical editors 
developed, such as the automatic validation and fixing of models using the Acceleo 
language. In addition, to materialize the relationships among the notations supported 
by the tool, a generic relations (or traces) metamodel (Figure 2) has been defined to 
support the creation of simplistic relations models and the Epsilon family of languages 
has been then used to implement a set of model transformations. This way, when any 
of these transformations runs, a relations model is generated along with the correspond- 
ing target model. Specifically, ETL has been used since it supports many-to-many 
model transformations and it eases the combination of declarative rules with imperative 
constructions and lazy and greedy rules. This is an essential feature, since many of the 
model transformations developed are not direct, but require certain level of interaction 
with the user in order to collect some design decisions that should guide the transfor- 
mation. In this sense, EOL has been used to improve user interaction by means of dialog 
boxes and to handle the transformations accordingly. 
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Figure 2. Generic relations metamodel for JNNoVaServ. 


We are conscious of the variety of traces metamodels existing in the literature. In- 
deed, some of them have been co-authored by us [11]. Nevertheless, a generic simplistic 
metamodel seemed enough to provide a proof of concept for the proposal. In the mean- 
time, a more complete metamodel, enabling the identification of more sophisticated 
relationships could be used. 

To handle and visualize the information collected in the relationships models, Mod- 
elink, a simple yet useful multi-panel editor provided by Epsilon is used. It consists of 
2-3 side-by-side EMF tree-based editors, which allows visualizing the source and target 
models, along with the relations model. Note that relationships collected in the latter 
can be directly edited in the editor. 

Again, it is worth noting that the visualizations provided by Modelink are planned 
to be improved by developing ad-hoc multi-panel editors like those presented in [11]. 
For instance, integrated overviews of all the models involved in a given project and 
their relationships could be supported this way. 

Finally, since the toolkit is still basically an EMF/GMF tool, it is consequently in- 
teroperable with any other EMF/GMF existing tool. Note that there exists plenty of 
them since EMF/GMF has turned to be the de-facto standard for the development of 
model-based tools for the last 10 years. For instance, leaning on Papyrus, UML models 
could be almost immediately combined with those supported by INNoVaServ for Ser- 
vice Design tasks. 


3 Related Works 


This section reviews existing works in the area from both the methodological and tech- 
nical point of view. However, it is worth noting from the beginning that none of the 
existing works or tools deal with all the notations supported by INNoVaServ, neither 
provide tool support to enable the processing of the information generated during a 
service design project. 

A quick look at the plenty of systematic literature reviews on business process mod- 
elling and the topics covered by them shows that this is somehow the most mature 
business modelling discipline. Recent reviews are indeed not focused on characterizing 
existing proposals, since that has been largely done in the past, but on available mech- 
anisms to assess their quality [12] or complexity [13]. 

However, despite the number of works in the area, still new approaches for business 
process modelling [15] and BPMN dialects [14] appear every so often. Many of them 
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are focused indeed on shortening the distance between professionals from business ar- 
eas and business process modelling notations [16]. 

By contrast, instead of defining yet-another business process modelling language 
tailored to business professionals, the aim of INNoVaServ is at providing them with 
tool support for the languages they are already using, like the Service Blueprint, BPMN 
or the PCN. At the same time, providing support to strategy - and organization - ori- 
ented business modelling notations like Business Model Canvas or evalue, will help to 
shorten the historical distance between IT and business practitioners. The models de- 
fined and handled by management areas become directly connected (or even mapped) 
to the models used by IT practitioners, more frequently expressed in terms of BPMN 
or UML. 

On the other hand, despite the recent interest attracted by the field due to the rise of 
product-service-systems [8], business intelligence modelling [17] and some other dis- 
ciplines, research on strategic- and organization-oriented business modelling, is still at 
an early stage, probably because the business process model hype preceded the business 
model one [4]. 

Regarding tool-support, provided that no tool has been found supporting the five 
notations integrated in INNoVaServ, some of the existing tools supporting at least two 
of them are briefly discussed in the following. 

Canvanaizer* and Real Time Board’ are web-based applications that supports col- 
laborative edition of Business Model Canvas and Service Blueprint diagrams. They 
own a simple and intuitive graphical interface (specially the latter) but they are not 
based on models, so the represented information is merely graphical. They do not offer 
export capabilities in a format suitable for post-processing (such as XML), so the output 
format is reduced to a simple image. Both are commercial solutions, offering free lim- 
ited editions. 

Tool support for e?value was so far limited to the e*editor, a desktop application that 
allows representing graphically and accurately e*value diagrams. Models can be per- 
sisted in RDF format, which simplifies export/import tasks. 

Regarding PCN, no tool has been found supporting this notation. The only way of 
defining PCN diagrams to date was using generic diagramming apps or even image 
editors, like MS Visio or Lucidchart. 

Finally, as already mentioned, there are plenty of BPMN tools, such as Bonita Stu- 
dio, Signavio, BizAgi or IBM WebSphere, each one providing different capabilities. 

All this given, to the best of our knowledge this is the first proposal to consider the 
business modelling notations discussed here and providing tool support to use them in 
the context of an integrated environment which ease the transition between each of the 
notations considered. 
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Abstract. In the context of software model-driven development, artifacts are 
specified by several models describing different aspects, e.g., different views, 
dynamic behavior, structure, distributed information, etc. Then, maintaining and 
repairing consistency of the whole specification are crucial issues if the models 
can be separately developed and updated. Model Synchronization is the process 
of restoring consistency after the update of one or several of the models. In the 
present work, we approach the case when conflicts may arise due to concurrently 
updating different models. Specifically, based on the Triple Graph Grammar ap- 
proach, we propose an incremental algorithm CSynch for solving conflicts and 
repairing consistency. In addition, we identify and formalize when a synchroniz- 
ing solution can be considered adequate and show that our procedure CSynch is 
sound and complete. 


1 Introduction 


In the context of model-driven development, artifacts are specified by several models 
describing different aspects, e.g., different views, dynamic behaviour, structure, interac- 
tions, etc. Moreover, a given set of models is said to be consistent if they describe some 
software artifact. Along the process of designing and implementing an artifact, and also 
after the artifact is implemented, it is common to modify or update some aspects of a 
given model, or of several models. These changes may cause inconsistencies between 
the given set of models. To restore consistency, we have to propagate these modifica- 
tions to the rest of the models. This process is called model synchronization. If at each 
time, we just propagate the updates on one model, synchronization is said sequential, 
but if we propagate simultaneously updates on several models, synchronization is called 
concurrent. Most existing work on model synchronization deals with the sequential 
case, which is simpler than the concurrent one, since in the latter case we have to deal 
with possible inconsistencies between the modifications applied to different models, 
implying that in the synchronization process we may need to backtrack some updates. 
Moreover, the existing approaches to concurrent synchronization [37,38,14,11,34,35] 
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are based on sequentializing the process, i.e., on combining in some way propagation 
procedures defined in sequential synchronization. For this reason, these approaches are 
called propagation-based in [24], where it is shown that they have important limitations. 

When the given concurrent updates are inconsistent among themselves, the syn- 
chronization procedure must backtrack some of these updates to restore consistency. 
However, in this case, not all synchronizing solutions are adequate. For instance, a pos- 
sible inadequate solution could be backtracking all updates. None of the approaches 
considering conflict resolution [14,11,34,35] define any form of adequacy, other than 
consistency of the given result. Moreover, these approaches return only one possible 
solution, which may not coincide with the user wishes. 

A simple but powerful way of describing a class of consistent (synchronized) mod- 
els is by using a Triple Graph Grammar (TGG) [27,28], since this approach provides 
techniques and tools that allow the general formulation and resolution of problems as- 
sociated with synchronization. In these years these techniques have had considerable 
success, producing a large number of contributions of proven utility. 

In [10], it is claimed that synchronization procedures should be incremental, mean- 
ing that their execution cost should not depend on the size of the models, but on the 
size of the update, so that the final consistent models must not be rebuilt from scratch. 
Other approaches that propose incremental sequential synchronization procedures are 
[22,12,25]. In contrast, none of the existing approaches to concurrent synchronization 
is incremental. 

The main contributions of this paper are: 


— The definition of properties, other than consistency, to ensure the adequacy of con- 
current synchronization solutions. 

— The definition of a non-deterministic incremental algorithm for concurrent syn- 
chronization, that is not propagation-based, whose solutions satisfy our adequacy 
properties. The algorithm is nondeterministic to consider the possible choices of 
conflict resolution. In particular, the algorithm is shown to be complete, in the sense 
that it finds all adequate solutions to the synchronization problem. 


The rest of the paper is organized as follows. In Sect. 2, we summarize the basic and 
preliminary notions and terminology required in the rest of the paper, and we introduce 
a running example. In Sect. 3 we introduce and formalize the properties that should be 
satisfied by the synchronizing solutions in order to be considered adequate. In Sect. 4, 
we propose our synchronizing algorithm which is proven to find all solutions that satisfy 
the properties mentioned above. Finally, in Sections 5 and 6 we present related work, 
conclude and describe future work. 


2 Preliminaries 


In this section, we describe some basic notions and terminology concerning model 
transformation and model synchronization by Triple Graph Grammars (TGGs). More- 
over, we introduce the example that we will use in the paper. 
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2.1 Triple Graph Grammars 


TGGs are a formalism developed by Schiirr ([27]) to specify and implement model 
transformations. They are based on three main ideas: 


— Models can be represented by some kind of graphs. 

— Instead of representing a consistent pair of models by two graphs, it is better to do 
it by a triple graph ([27]) which, in addition, includes the correspondence between 
the elements of the two models. 

— To specify the class of consistent triple graphs we use a (triple graph) grammar, i.e., 
a triple graph is consistent if it can be generated from a given start graph (typically, 
the empty graph) using the production rules of the grammar. 


More precisely, a triple graph G = (G5 oes GT) consists of a source graph 
G’ and a target graph GT , which are related via the correspondence graph G© and two 
mappings (graph morphisms) sg:G° —G® and tg:G° — G" specifying how source ele- 
ments correspond to target elements’. For simplicity, we use the notation (G$, GT} whe- 
never the explicit correspondence graph can be omitted. 


Then, a TGG G consists of a start triple graph*, SG, and a set T< R 
of production rules of the form r:L— R, where L and R are triple | | 


graphs and L C R. Then, L(G) = {G | SGS G} is called the class 
of consistent models and D(G) = {SG Š G} is the set of deriva- G< G 


tions defined by G, where Š is the reflexive and transitive closure 
of the one step transformation relation > defined as follows: Gj => G3 if there is a pro- 
duction rule r:L—R in G and a matching monomorphism m :L— G; such that G2 
can be obtained by replacing (the image of) L in G; by (a corresponding image of) R. 
Formally, this means that the diagram above on the right is a pushout in the category of 
triple graphs. In this case, we write G1 = Go, or just Gj=>G) if r and m are implicit. 

For instance, in Fig.1 we depict the graph grammar that we use as a running example 
to illustrate our techniques. It is a simplified, and slightly modified, version of the well- 
known transformation between class diagrams and relational schemas. 

The graphs considered in this example are typed, which means that a type graph 
describes the different classes of nodes and edges of our triple graphs, in a similar way 
as a metamodel describes the kinds of elements that we have in a model. In particular, 
the type graph of our example is depicted on the left of Fig.1. Source models, whose 
type graph is depicted on the left, consist of three kinds of nodes: classes, attributes and 
sub-attributes>, and three kinds of edges: A (thick) edge between two classes represents 
a subclass relationship between them; attributes are bound to their associated classes 
and sub-attributes to their associated attribute, respectively, by the second and third kind 
of (thin) edges. Similarly, the type graph of target models is depicted on the right of the 


3 In the context of this paper, it does not make too much sense to speak about source and target 
models. Nevertheless, we have kept this terminology to simplify the notation for referring to 
each of the two models involved. 

4 As said above, without loss of generality, we consider that SG is always the empty triple graph. 

5 It is not necessary to associate any semantics to sub-attributes and sub-columns since we just 
use them to introduce a bit more complexity to the example. 
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Fig. 1. Type graph, four rules for class-to-table transformations 


type triple graph, consisting of tables, columns and sub-columns, together with edges 
between them. Finally, in the middle, there is the type graph of the correspondence 
models, consisting of three kinds of nodes: square nodes to bind classes with their 
associated tables, round nodes to bind attributes with their associated columns, and 
triangle nodes to bind sub-attributes with their associated sub-columns. 

The rules of the TGG defining the consistent transformations between class dia- 
grams and relational schemas are depicted on the right of Fig. 1. Rule rı, Class2Table, 
creates a new class and its corresponding table, together with the correspondence ele- 
ment that relates the class and the table. Rule r2, Attribute2Column, given a class and a 
corresponding table, creates an attribute of that class, a related column of the table, and 
their associated correspondence element. Rule r3, Subclass2Table, given a class and a 
corresponding table, creates a new subclass. In this case, the subclass is related to the ta- 
ble through a new correspondence element. Finally, rule r4, SubAttribute2SubColumn, 
creates a new sub-attribute together with its corresponding sub-column. 

On the left of Fig. 2 we depict a triple graph generated by this grammar. For in- 
stance, it could have been created from the empty graph, firstly, applying twice rule 
Class2Table to create classes cı and c2 together with their associated tables tı and tz and 
correspondence elements; next, applying rule Subclass2Table, to create c3 as a subclass 
of c2, together with a correspondence element that specifies that fz is the table associated 
to c3; finally, applying three times the rule Attribute2 Column, to create attributes a1, az 
and a3, together with their associated columns, the associated edges binding attributes 
and columns to their classes and tables, and their correspondence elements. 


2.2 Model Update and Model Synchronization 


For different reasons, given a consistent model G, we may perform some modifications 
or updates in it producing a model G that is not consistent anymore. Then the synchro- 
nization problem consists of repairing that model, so that it becomes consistent. 

For instance, in our running example, we assume given the consistent model on the 
left of Fig. 2, and that two updates are defined on that consistent model: removing the 
subclass relation between c2 and c3 in the source model, and adding a new sub-column 
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sco3 to the column co3 in the target model. In the middle of the figure some elements of 
the triple graph have been marked. These marks ({+, x, !,?}) represent possible actions 
to be taken on the elements (adding, deleting or keeping them) as the result of the 
analysis performed in our algorithm, which we describe in the paper. Some elements 
have several marks that are contradictory. This tells us that some conflicting situations 
may arise when defining a repair. Finally, on the right of the figure, there is one possible 
repair of the marked triple graph that avoids conflicts and restores consistency. As we 
will see, this repair can be made incrementally, acting only on some elements (grey 
area) without having to rebuild the whole triple graph. 
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Fig. 2. Concurrent update, marked affected area with conflict and possible repair 


Formally, an update or modification [8] u on a graph G is a span of inclusions 
u:G4+— K — G' for some graph K. Intuitively, the elements in G that are not in K are the 
elements deleted by u, and the elements in G’ that are not in K are the elements added 
by u. So, K consists of all the elements in G that remain invariant after the modification. 
When K may remain implicit we will denote the update u: G+ K —>G' by u: G => G’. 


H 


Updates can be composed and de- G Kı X Kə 
composed [24]. Given two updates v: 
G+Kı >X and w:X -K >H, the No 
composition of v and w is the update u = K 
wov:G+¢ K—>H such that, roughly, K 
is the intersection of Kı and Ko, i.e. K includes all the elements of G that are neither 
deleted by v nor by w. In addition, we say that u decomposes into v and w if u = wo v 
and moreover no element added by v is deleted by w. Roughly this means that X is the 
union of Kı and K2 with respect to the common part K. If u decomposes into v and w, 


we also say that v is a subupdate of u, which we denote by v < u, since in this case, v 
adds and deletes less elements than u. 
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In the non-concurrent case, given a triple graph G and an update w5 : G —> H’ on 
the source graph, the synchronization problem [16] is to find an update w” :G? —> HT, 
such that H is consistent. In this case, we say that w” is the propagation of w’. 


In contrast, in the concurrent case, given updates 


G H 
aes HÈ and uf :G? => HË , or equivalently the w 
triple graph update (u,id,u’):G —> Hp, also called 
a concurrent update, the concurrent synchronization 4 y 
Ho 


problem is to find a concurrent update w: G => H, 
such that 7 = (us, id,u7) is a subupdate of w and H is 
consistent. Previous work on this problem is based on building concurrent solutions by 
combining (in some way) v and v7, where v’ (respectively, v?) is the propagation of u? 
(respectively, of u). For this reason, in [24] these approaches are called propagation- 
based. However, as we pointed out in the introduction, in that paper it is shown that 
propagation-based approaches have important limitations. 

A main problem in concurrent synchronization is that the given updates u’ and u7 
may be in conflict. For instance, u’ may delete a node n in GS and u may add an edge 
whose source-node is in correspondence to n in G9. When a concurrent update is in 
conflict it will be impossible to solve the synchronization problem, so we will have to 
backtrack (or to ignore) some of the deletions or additions in 7 to eliminate that conflict. 
In these situations, the concurrent synchronization problem needs to be reformulated. If 
u is in conflict, we would look for an update w such that a subupdate W of u (i.e., some 
part of u not in conflict) is also a subupdate of w. This is equivalent to saying that there 
is an update Y such that vou = w, where Y backtracks some conflicting updates included 
in Z. We must note that detecting conflicts is in general not an easy task, since u’ and u? 
modify different models, so they do not directly interfere, which means that conflicts 
are never explicit. We may also note that, according to this definition, id: G —> G, 
i.e., the identity modification that changes nothing, would always be a solution to the 
concurrent synchronization problem (in this case v would be the inverse of u, so we 
would completely backtrack z). Obviously, this is not the kind of solution that we want. 


2.3 Dependency Relations 


Incrementality of (sequential or concurrent) model synchronization requires two con- 
ditions for any given approach: to be able to identify what part of the given model is 
affected by an update, so that the rest can remain unchanged and we can concentrate 
on the affected part to build a solution; and that we can do this identification without 
having to fully analyze the given consistent model. Otherwise, the computational cost 
of a synchronization algorithm will always depend on the size of the given models. 
Our approach to incrementality, which follows the ideas introduced in [25] for se- 
quential synchronization, is based on the idea that the structure of a given consistent 
model depends essentially on the derivation that was used to create it. We mean that if 
we perform any update on the model, we just have to care about the parts of that deriva- 
tion that are affected by the update. For instance, if the update consists of the deletion of 
some element, then the application of the rule that created that element in the original 
derivation and the further application of other rules that depend on that creation, will be 
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considered the affected part of the derivation. It must be clear that this does not mean 
that, if SG > G1 > --- > Gy > --- > Gis the derivation used to create G, the deletion 
of an element in Gy will affect all the rule applications in the derivation Gi > --- > G, 
because some of these rule applications may be independent of that deletion. For in- 
stance, in our example, if the deleted element is a class, the creation of other classes, 
attributes or subattributes that are not related to that class would be independent of that 
deletion. Technically, the reason would be that the application of these rules is sequen- 
tially independent ([6]) of the application of the rule that created the class. In what 
follows we will denote by dg the derivation® used to create G. 

Since in the synchronization algorithm we need to know which is the derivation used 
to create the given consistent model and storing and analyzing that derivation may be 
costly, the second idea of our approach is to define some dependency relations between 
the elements of G that allow us to know if the application of some rule depends on the 
application of another rule. We assume that these relations are stored together with G. 
The first relation, called strict dependency, denoted e1<@ ep, holds if e; is matched by 
the left-hand side of the rule that created e2. For instance, in the triple graph on the 


left of Fig. 2, we have c2 <3 and f ce, since the application of rule Subclass2Table 
that creates c3 has to match its left hand side to c2 and t2. The second relation, called 
interdependency, denoted e 1 nF ez, holds if e; and ez are created by the same rule. For 
instance, in Fig. 2, c2 bd? to, since they are both created by the same application of the 
Class2Table rule in dg. Finally, dependency, denoted <6, is the reflexive and transitive 
closure of the union of <4 and <f. 


Definition 1 (Dependency Relations [25]). Given a TGG G and a derivation dg: 
SG>G, we define the following relations on elements of G: 


1. Strict dependency: <© is the smallest relation satisfying that TC R 
if dg includes the transformation step depicted on the right, 
then for every e in L and e'in R\L, m(e)<°m'(e’). m p 
2. Strict interdependency: xË is the smallest relation satisfying Gc G 
‘ ‘ : ‘ i-1 
that if dg includes the transformation step depicted on the 


right, then for every e,e! in R\L, m (e) Em (e’). 
3. Dependency: = (<CU w8) *. 
It may be noticed that there is a bijective correspondence between derivations (up to 


permutation equivalence) and their associated relations. This means that storing these 
relations together with a model is equivalent to storing the derivation used to create it. 


3 Synchronizing Solutions for Concurrent Updates 


According to what we discussed in the previous section, we consider the general prob- 
lem of concurrent synchronization when there may be conflicts in the given concur- 
rent update. Moreover, we assume that we are only interested in incremental solutions, 


6 Tt may be noted that there may be many derivations that lead to G, here we assume that dg is 
the one chosen to generate it. 
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which means that our solutions are assumed to preserve a certain triple subgraph of the 
given consistent model’. Finally, to avoid having to mention explicitly the TGG of the 
given synchronization problem, we will consider that we are working with a fixed TGG, 
G, which has been given a priori. 


Definition 2 (Incremental Synchronizing Solutions). Given a concurrent update Ti: 
G = Ap, such that G is a consistent model, and given a submodel Go C G, a concur- 
rent incremental solution of @ with respect to Go is an update w: G => H such that H 
is consistent and dy includes the derivation dg,. Then, SynchSol(G, Go,i) is the set of 


all concurrent incremental solutions of u with respect to Go. 


In general, a concurrent synchronization problem may have several possible solu- 
tions especially if it has some conflicts, because in this case there may be different op- 
tions of backtracking to eliminate the conflicts. To decide which solutions are “better”, 
we may use different criteria but, unfortunately, these criteria may be contradictory. For 
this reason, we believe that it should be the user who decides which is the preferred 
solution. Nevertheless, there are solutions which may be considered inadequate or not 
fully adequate. For instance, backtracking all updates, so that the final outcome is the 
original consistent model, would technically be a correct solution, but we can not con- 
sider that it is adequate. The adequacy criteria that we consider are the following: 


— Maximal covering: When ū has conflicts, we would like that our solution back- 
tracks as few as possible additions and deletions in 7, because users decided these 
additions and deletions. In this sense, the solution w has a maximal covering if H 
contains as many as possible elements that are added by u and as few as possible el- 
ements that are deleted by @. In this case, a solution would be optimal if H includes 
all the elements added by t and no elements deleted by a. 

— Minimal information loss: The addition or deletion of an element in @ may force 
the deletion of other elements from G. Since these elements may include some 
information, their deletion will cause an information loss in the model, which we 
would like to minimize. In this sense, the solution w has minimal information loss 
if H cannot be extended to a solution that contains more elements from G without 
having more additions than H. 

— Minimal unrelated additions: The addition or deletion of an element in 7 may cause 
the addition of other elements in w. For instance, if in our example we add a table 
the synchronization procedure will need to add its associated class. However, a 
solution may include other added elements that are not required by the given update. 
We consider that we should minimize this kind of additions. 


Definition 3 (Properties of Synchronizing Solutions). Given a derivation d = SG > 
Ge D(G) and a concurrent update u: Ge Ko— Ho, we say that a consistent incremen- 
tal solution w: G+— K +H € SynchSol(G, Go, i) has: 


1. Maximal covering: if there does not exist any other solution Y € SynchSol(G, Go, u), 
such that W' < V', where V' is the largest common subupdate of V and ū, and W' is the 


7 We may notice that if that subgraph is the empty graph then we would be looking for all 
possible solutions. 
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largest common subupdate of W and U,, i.e., V' (resp. W') consists of all the additions 
and deletions that are both in u and V (resp. W). 

2. Minimal information loss: if there is no other update ¥ € SynchSol(G, Go,i), with 
v=G¢K 3H, such that H Š H', K CK’ and (H \K’) = (H\R). 

3. Minimal unrelated additions: if for any element x € H added by W, there is an 
element y € HNG, such that x <4 y. 


For instance, on the right of Fig. 2 we can see an example of a consistent solution 
which has maximal covering, minimal information loss and no unrelated additions. In 
contrast, in Fig. 3 neither the solution on the left nor the one in the middle have maximal 
covering, even though both of them are consistent. The solution on the right of Fig. 3 
has maximal covering, minimal information loss, and no unrelated additions, but it is 
not comparable with the one in Fig. 2. 


i i 
saySA |<-- A ------3 >| scoySCo 
1 1 


Fig. 3. Other three consistent solutions for example in Fig. 2. 


4 An Incremental Procedure CSynch 


In this section we propose a two-step nondeterministic incremental algorithm CSynch 
that allow us to find all solutions to the concurrent synchronization problem that are 
minimal, in the sense that they do not have unrelated additions, and that, moreover, 
have maximal covering and minimal information loss. More precisely, depending on 
the choices made we will get a different solution. 

The algorithm is not based on propagation, but on using rules derived from the given 
TGG which allow us to identify which elements are affected by the update, to identify 
and solve possible conflicts, and to restore consistency. This identification is done by 
a marking algorithm CMark that simulates the addition and deletion of elements by 
applying these derived rules on the model such that some of its elements have being 


282 F. Orejas et al. 


decorated with some marks from the set {+,x,!,?}. If an element e is marked with any 
of these marks, it means that e has been added or deleted by a user (+ or x, respectively), 
itis required for an addition (!), or it is affected by a deletion (?). Technically, this means 
that every element of the model has an attribute called marks C {+,x,!,?} that denotes 
the set of marks that an element has at a given moment. Initially, before starting the 
synchronization process, it is assumed that marks = 0 for every element of the model. 
If it happens that, at certain point, an element is marked with different marks, it may 
denote an apparent conflict®, which may need to be solved. 

Since we need to know the dependencies between the marked elements, we will 
build extensions of the dependency relations of the given model G. This extended rela- 
tions are denoted <I’, <’, and x, i.e., IOC, «© C </, and pt Cox’. In addition, using 
these relations, CMark computes Go, the submodel of G not affected by the update. 

Once the model is marked, an algorithm CRepair detects and solves conflicts and 
repairs the model. This process removes the marks of some elements and deletes the 
rest of them, in such a way that the final outcome is a consistent triple graph. 


4.1 Marking 


Before defining the marking algorithm that we use in the first step of our synchroniza- 
tion procedure, let us first explain how we deal with additions and deletions. 

In our running example, let us suppose that the user has added an edge between 
the attribute a; and the class c2 (perhaps to apply a refactoring to the given system). 
We know that in consistent models an edge between an attribute and a class is added 
when applying rule r2 :L — R, Attribute2Column. This rule, given a class and a table, 
adds to a given model an attribute, a column, edges between the attribute and the class, 
and between the column and the table, and a correspondence element that relates the 
attribute to the column. So, the idea is that in the synchronization algorithm, we are 
going to “simulate” the application of that rule to create the edge between a, and cp, 
and to do this, we are going to apply a rule r, -L’ — R’, derived from rz, but before 
describing this rule, we have to take into account two questions: 


1. Some of the elements that are created by r2 may be already in the model. So, instead 
of creating them again, we include them in the left-hand side T of the derived rule. 
Similarly, some other elements created by r2 may coincide with other elements 
added by z, then we will consider that r) creates also these elements. For instance, 
if @ would create an attribute a and an edge associating a to a class c, in this case, 
the associated marking rule would create simultaneously the attribute and the edge. 
But this is not enough to ensure that the final outcome is consistent, we need to 
be sure that all the elements in Z are in the final result. Otherwise, these elements 
could be deleted as a consequence of some other addition or deletion in the given 
update u. For this reason, r4 will mark the elements in LXE with !, expressing that 
they are required for the correctness of the result. In addition, the rule will also 
mark the elements created by the rule, i.e. in R\L, with the mark +. This includes 
the elements added by w, but also some other elements created by the rule. For 


8 As we will see, not all apparent conflicts are real conflicts. 


Incremental Concurrent Model Synchronization 283 


instance, if we want to add the edge from a, to c2 to the model, we must also add 
the edge from co, to t2. Following these ideas, rule r/, is depicted on the left of Fig. 
4. Moreover, on the right of that figure we also show which would be the associated 
marking rule, when we add an attribute to a class in a given model. 

2. Since in the resulting model, H, we assume that this edge from a to c2 is created 
using 72, this means that we assume that in H, the edge was created together with 
the attribute a), the corresponding column co, the edge between a, and c2, the edge 
between co, and fz, and the correspondence element between a, and co. However, 
in the original model G these elements were created using a different application of 
r2, and together with them, some other elements were also created (in this case, the 
edges between a; and cı and between co, and tı), that are still part of the model. 
So if we want H to be consistent, we will need to delete these elements from the 
model. As a consequence, we will mark them with ?, denoting that in principle, 
we have to delete them, and we will say that they have been revoked. Finally, if 
some elements are revoked, we may need to delete all the elements in the model 
that depend on them. So we will mark all these other elements with ?, expressing 
that they may need to be deleted too. 


The case of marking the deletions in the update 7 is simpler. If an element x in G 
is deleted by u, we will just mark it with x, denoting that x has to be deleted and, as 
before, we will mark all the elements that depend on x with ?. 

Finally, if we call Go the graph consisting of all elements that have not been marked 
with +, x, or ? then, as a consequence of the way that the marking algorithm works, 
we can be sure that Go is consistent (as shown in Thm. 1), since all elements in Go 
were already in G and they are not dependent of any element that is not in Go. Hence, 
building our solution by adding to Go some of the marked elements, using rules from 
the given TGG, ensures that the final result is consistent. Moreover, the algorithm would 
be incremental with respect to Go, since its elements will not be processed by CRepair 
(except for deleting some ! marks). 


Definition 4 (Derived Marking Rules). We say that a triple graph G is decorated with 
marks if each of its elements has a marking attribute marks C {+,x,!,?}. Let us denote 
as RemAttr(G) the triple graph resulting from removing from G the attribute marks. 
Given the rule r:L—+R, we say that r':L’ +R’ is a derived marking rule from r for 
adding a set of elements X, eo and R' are two decorated triple graphs such that: 


1 LC RemAttr(L’) CR, RemAttr(R ) =R, and X C R\L. 
2. All elements in RemAttr(L)\L are included in R' with the mark |. 
3. All elements in R\L are included in R' with the mark +. 


For instance, the rule on the left of Fig. 4 is derived from the rule r2 Attribute2Column 
to add a new arrow from an already existing attribute to an already existing class in the 
model. Notice that the elements that are really new, i.e., produced by the application 
of r2, are marked with +, while the ones produced by r2 but reused from the model 
by the derived rule, are marked with !. The rule on the right is also derived from At- 
tribute2Column but now to add a new attribute to an existing class. As a consequence, 
there are not reused elements that should be marked with !. 
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Derived marking Attribute2Column for addition of Derived marking Attribute2Column for addition of 
the arrow from the attribute to the class the attribute and the edge to the class 


Fig. 4. Examples of derived marking rules 


Now we can introduce the marking algorithm CMark following the explanations 
given above. A and D will be the set of elements that have to be added or deleted, 
respectively. Initially, we assume that A and D consist of the elements added and deleted 
by u, and that the sets of marks are empty for all the elements in the model. Then: 


Algorithm 1 (CMark Algorithm) 
Initialize relations <'=<), < = <, and x! =M. 


1. Addition and revocation: For every element x € A, select a marking rule r' -L OR 
derived from r:L—R that may be used to create x, and let X CA be a set of elements 
that can also be created by r': 

- Eliminate from A the elements in {x} UX. 

- Apply rr: >R'. 

- Add ? to the attribute marks of every element which is not in RemAttr(L )\L 
but it is strictly interdependent with an element matched to RemAttr(L )\L. 

— Add? to the attribute marks of every element which is dependent on a )-marked 
element. 

2. Update the dependency relations adding the new dependencies and interdependen- 
cies defined by the application of the original rules used in 1. to relations <, <, 
and >t; and computing the new transitive closure. 

3. Deletion: 

— Add x to the attribute marks of every element intended to be deleted. 
— Add ? to the attribute marks of every element that is dependent of an x-marked 
element. 

4. Computing Go: Delete from <, <, and x all elements marked with +, ?, or x. Then 
Go would be the model generated by the derivation associated to the dependency 
relations. 


For instance, in the middle of Fig. 2 and Fig. 5 we can see examples of a marked 
model following the above algorithm. In the case of the example in Fig. 5, the concur- 
rent update would consist of adding a subclass relation between classes c and c2, in the 
source; and, adding a new sub-column sco3 to the column co? in the target. Again, in 
the model of the middle, some elements are marked with contradictory marks. Notice 
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that now, possible conflicts arise because of trying to integrate concurrent additions. In 
fact, this example serves to illustrate that some additions may imply the deletion of el- 
ements created by the original derivation, for instance, it is the case of the table t2. That 
is, some additions may imply revocation of original derivation steps. 


| 


O 


CRepair 


-—=—=—> 


etal tlt ee er 


$co3;SCo 


sco3:SCo 


Fig. 5. Other example of concurrent updated, marked and possible repair 


We must note that this algorithm is nondeterministic since, when we want to add an ele- 
ment x to the model there may be more than one rule that can be used to create x. Then, 
choosing different rules will lead to different results of our synchronizing procedure. 


4.2 Repairing and Conflict-Solving 


The first idea underlying our repair algorithm, used as a second step of our synchroniza- 
tion procedure, is to extend the model Go, represented by the dependency relations <, 
<, and œ, using rules from the given TGG, to include the elements that the user asked 
to add to the model (i.e. added by the given update x) and to reduce the information 
loss. In particular, if in the marking process we decided to use a rule r:L — R to create 
an element x required by u (i.e., to create x we used a marking rule r” associated to r), 
we will use another rule also derived from r, r’: L +R, where L C RemAttr(L ) =R 
that unmarks all the elements that we marked with + or !, i.e., if we remove all marks 
in L’ we get R. We call these rules derived recreating rules, because they create again 
(by reusing them) some elements that were originally in G. We must note that, using 
the information in the dependence relations <’, <', and œx’, we may know which is the 
rule r. In particular, L would consist of x and all the elements y such that x >! y, and R 
would consist of all the elements z such that x <' z. 

This idea for reusing elements from the original model has already been used in 
[12,25]. Notice that these rules eliminate the marks from the recreated elements, and as 
a consequence, the recreated elements will be now part of the solution. 

The second idea for our repair algorithm is that we can also use derived recreation 
rules for reducing the information loss, including in the solution elements that were 
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removed from the given model because they depended on elements that could have 
been deleted. 

Finally, the third idea in which our algorithm is based is that, if we try to create 
an added element x using the derived rule 7’: I — R, if an element of L’ is matched 
to an element y of the model having the mark x, this means that we have discovered a 
conflict, because we have a conflict between the deletion of y and the addition of x. As a 
consequence, we have two options, either we do not apply that rule, which is equivalent 
to backtrack the addition of x, or we do apply the rule, which would be equivalent to 
backtrack the deletion of the element including the mark x. 


Definition 5 (Derived Recreating Rules). Given a rule r:L—+R, we say that r':L +R 
is a derived recreating rule? from r if L C RemAttr(L ) =R, such that 


1. The elements in T from L must be matched to elements without marks. 
2. The elements in T not in L can be matched to elements with any mark. 


For instance, in Fig. 6 we can see some examples of some derived recreating rules. 


— 


Derived recreating Class2Table 


Fig. 6. Examples of derived recreating rules 


’ To be precise, recreating rules are like standard DPO rules, i.e. of the form T + R—>R, which 
means that the rule does not add anything, since it just deletes the marks from the given ele- 
ments. We may note that an unmarking rule is always applicable, since the gluing conditions 
always hold. For details, we may look at [17,15]. 
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Algorithm 2 (CRepair) 


1. Recreating and conflict solving: While there is a recreation rule that can be ap- 
plied: 

— If there is an element marked + by a marking rule associated to a rule r:L— R 
and when trying to apply the associated recreating rule rT +R, no element 
in L’ is matched to an element including the x mark, then apply r' and modify 
accordingly the dependency relations of the solution, adding to < and m the 
dependency and interdependency relations between the elements matched by 
elements in L and R; and computing the new transitive closure for <. 

— Inthe same situation as in the previous case, but where there is an element in T 
that is matched to an element marked x, choose between applying r' modifying 
accordingly the dependency relations of the solution, or replacing the mark + 
by the mark x for all elements matched by T that include the mark +. 

— Otherwise, apply a recreating rule r' :L’ +R such that no element in L’ is 
matched to an element marked x and modify accordingly the dependency rela- 
tions of the solution. 

2. Removing: Delete every marked element. 


That is, in step 1. of CRepair, we first try to recreate every + or ! element and to 
reduce information loss as much as possible. However, when we detect a conflict when 
trying to recreate an element marked +, it is nondeterministically chosen between ap- 
plying the addition or the deletion. And in step 2 all elements still marked are removed 
from the model, because they needed to be deleted or because it was not possible to 
recreate them. 


Algorithm 3 (CSynch) 


1. Apply CMark. 
2. Apply CRepair 


The resulting update is w: G => H, where H is the result obtained by CSynch. 


4.3 Properties of CSynch 


In this subsection we prove the properties that our algorithm satisfies. Firstly, we will 
prove that all solutions obtained by CSynch are consistent, incremental and they have 
no unrelated additions. We will also prove that CSynch can compute all incremental so- 
lutions that, in addition, have maximal covering and minimal information loss, provided 
that the right choices are made. 


Theorem 1. Given a consistent model G and an update u:G => H if the update w: 
G => H is a solution obtained by CSynch, then: 


H is consistent. 

W is incremental with respect to the triple graph Go computed by CMark. 
w has no unrelated additions 

W has minimal information loss. 


ARNAN 
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Proof. The last three properties are just a consequence of how CSynch is defined. Let 
us prove that H is consistent, but before we will prove that Go is consistent. 

Let SG => G1 =>- - > Gy > --- > G be the derivation used to create G and let 
SG > G, => --- > G; be its longest subderivation such that G; C Go, let us show that 
G; = Go, which implies the consistency of Go. 

Suppose that there is an element x € Go, which means that x is not marked with +, ?, 
or x and it does not depend on any marked element, such that x ¢ G;. Let k be the earliest 
derivation step Gk = Gy+1, with i < k, where an element x € (Go\G;) was generated 
i.e., x € (Gey1\G). By definition of >, we know that x > y for every y € Gr+1\ Gr, 
and according to the definition of CMark, if x has not any of those marks, then y has 
not either. This means that Gz; \G; C Go. Now, if r:L — R is the rule applied in the 
derivation step Gy = G,+1 then there are two possibilities: 


1. If the elements matched by L in Gy are already in G;, it would mean that this 
derivation step is sequentially independent from all derivation steps G; > --- > Gy 
and we would have G;=> G41, with Gj; C Go, contradicting the hypothesis that 
SG > G, > --- > Gi was the longest subderivation such that G; C Go. 

2. If the elements matched by L in G4 are not in G;, it would mean that x depends on 
elements added in the derivation G; => --- = G. Moreover, we know that all the 
elements y generated in that derivation such that y < x are unmarked with +, ?, or x 
and therefore they are included in Go, because otherwise x would not be in Go. But 
this contradicts the hypothesis that x was an element in Go \G; added in the earliest 
possible derivation step. 


To prove that H is consistent, it is enough to notice that, because of how recreation 
rules are defined, if G; is a consistent unmarked subgraph of a marked graph Gi, and 
r':L —> R is a recreating rule associated to the rule r:L — R, then if G; A EA we 
have that G; = G;+1 such that G;+1 is a consistent subgraph of G; +1- In particular, if Gy 
is the result obtained by applying the marking algorithm to G and Go is its unmarked 
consistent subgraph, then applying the first step of CRepair leads to a sequence of 
recreation transformations Go 4G. This means that given the associated sequence of 


transformations Go => Gx, we have that G% is a consistent subgraph of Gc. Finally the 
second step of CRepair leads to H = Gg, which is the final result of CSynch. a 


Before showing the rest of the properties of CSynch, we must first define which 
is the subgraph Go such any solution of CSynch is incremental with respect to it. In 
general, depending on the choices of CSynch, the consistent model Go computed by 
CMark may be different, because the choice of rules used to add to G the elements 
added by uw defines different markings. So, if we want that all solutions of CSynch are 
incremental over the same graph, we can take the intersection of all these Go. 


Definition 6 (Set of Computed Solutions). Given a consistent model GE L(G) and a 
concurrent update 1: G+ K — Ho, we denote by CSynch(G,i) the set of all possible 
solutions computed by the algorithm CSynch when G and ū are given. 

If for every W:G+-K —H, we denote by G, the subgraph of G computed by CMark, 
then we define M(G,i) as: M(G,a) = Niwecsynch(G n) Cw 
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Obviously, if W is incremental over Gw, then W is also incremental with respect to 
any submodel of G,,. 


Proposition 1. [fw € CSynch(G,Z) then W is incremental over M(G,i). 


Finally, we can show that our algorithm is complete, i.e., that any consistent update 
that is incremental over M (G,u), and satisfies the required properties, can be found by 
CSynch. 


Theorem 2. Given a consistent model G and an update 1: G => H’, if w: G4- K — 
H is consistent, it has no unrelated additions, it has maximal covering and minimal 
information loss, and it is incremental over M(G,ū) then W € CSynch(G, i). 


Proof. If w:G+K—H is a consistent update, such that it has no unrelated additions, it 
has maximal covering and minimal information loss, and it is incremental over M (G, T) 
this means that there is a derivation d = SG > Hı > --- > Hy > --- > H such that 
M(G,u) CH, for some k. Then, if we make the right choices in CSynch we will compute 
the solution w. In particular, if CMark uses the same rule applications that are used in d 
to generate the additions in u, on the one hand, it will compute a model G,, that will be 
preserved by CRepair and that includes CSynch(G,7z). On the other hand, CMark will 
mark the model in such a way that if CRepair chooses the same rule applications (and 
in the same order) as in d, it will compute H. 


5 Related Work 


The concurrent synchronization problem can be considered as a special case of the 
general problem of model (or graph) repair. In particular, in our case, a triple graph can 
be easily represented by a single graph, so the consistency problem for triple graphs can 
be seen as a special case of the consistency problem for graphs. The literature on model 
repair is quite large (see [23] for an excellent survey on this topic), so it does not make 
much sense to review all the existing approaches. 

Concentrating on the problem of concurrent model synchronization, to our knowl- 
edge, the only works addressing the general problem!° of concurrent synchronization 
are [38,14,11,24,34,35]. All these approaches are propagation-based, which means that 
synchronization is performed, first, propagating the updates in one model to the other 
model, then checking if there is any conflict between the propagated updates and the 
ones previously applied to that model and, if there are, solving the conflicts in some 
given way, and finally, propagating back the updates in the second model to the first 
one. That is, sequentializing concurrent synchronization. In all cases it is shown that 
the result obtained is correct, but no other properties are shown. In particular, in all 
these approaches, except in [38] the trivial solution obtained by backtracking all the up- 
dates would be considered a valid solution. On the other hand, the approach presented 


10 There is some work considering this problem in a more restrictive setting. For instance, in 
[26] models are restricted to tree-like structures and the target model is an abstract view of the 
source; and in [36] updates must be defined in terms of a given set of operations. 
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in [38] may be unable to find some existing solutions, as shown in [24]. Actually, that 
paper shows that propagation-based approaches have important limitations. 

Our approach to incrementality is based on the ideas presented in [25], for the se- 
quential synchronization case. Other approaches based on TGGs that propose incremen- 
tal solutions to sequential model synchronization are [10,16,12,22] (and some variations 
on them) but all of them are, in our opinion, not completely satisfactory. In particular, 
even if the construction of the solution does not start from scratch but from the given 
consistent model G, the approaches in [16,12,22] have to analyze the whole model G 
(for instance, to know what parts of G must be modified) so their cost depends on the 
size of the given model. This is not the case of [10], but their approach only works for 
the case when source and target models are bijective, which excludes the case where 
source models are views of target models (or vice versa). In addition, in [10,16,22] there 
may be information loss, which we avoid using the approach developed in [12] and also 
used in [25]. 


6 Conclusion 


In this paper we have presented some properties that ensure the adequacy of solu- 
tions for a concurrent synchronization problem, together with an incremental non- 
deterministic algorithm that is able to return all possible sound solutions that, in ad- 
dition, satisfy these properties. 

Most existing algorithms for model synchronization return just one solution. We 
believe that this is not adequate, especially in the case of concurrent synchronization. In 
that context, one concrete solution corresponds to a specific way of solving the existing 
conflicts, which may not be the way that the user would have preferred. For this reason, 
we decided that completeness of the algorithm was an important issue. It is clear that, 
in practice, delivering to a user a relatively large set of solutions is not very convenient. 
However, we think that this is something to take into account at the implementation 
level, for instance, by showing conflicts in a stepwise way and, then, showing the dif- 
ferent ways of solving each conflict. 

From a theoretical viewpoint, our algorithm works for any kind of graphs. However, 
in practice, if the models have attributes, our algorithm would not be adequate. For 
example, let us suppose that we are working with a class of models where a certain 
attribute a; must be equal to the addition of attributes az and a3 and let us suppose that 
we are trying to synchronize a model G, where a1 has some given value vj, but az and 
a3 have no value, i.e., the synchronization algorithm should provide values to az and a3, 
such that their addition equals vı. In this context, our algorithm would deliver infinite 
solutions, assigning to a2 and a3 all possible values v2 and v3 such that vı = v2 + v3. In 
general, dealing with attributed graphs in the context of sequential or concurrent model 
synchronization poses problems that are described in [1,21]. 

As future work, on the one hand, we plan to address the case of attributed models 
and, on the other hand, to extend our results to the multimodel case, i.e. when synchro- 
nizing more than two models. This case has specific complications, see, for instance 
[32,4]. It has already been approached in [34,35], but just as a straightforward general- 
ization of [14], which means that it shares its limitations. 
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Abstract. We propose a new Statistical Model Checking (SMC) method 
to discover bugs in variability-intensive systems (VIS). The state-space 
of such systems is exponential in the number of variants, which makes 
the verification problem harder than for classical systems. To reduce ver- 
ification time, we sample executions from a featured transition system — 
a model that represents jointly the state spaces of all variants. The com- 
bination of this compact representation and the inherent efficiency of 
SMC allows us to find bugs much faster (up to 16 times according to our 
experiments) than other methods. As any simulation-based approach, 
however, the risk of Type-1 error exists. We provide a lower bound and 
an upper bound for the number of simulations to perform to achieve the 
desired level of confidence. Our empirical study involving 59 properties 
over three case studies reveals that our method manages to discover all 
variants violating 41 of the properties. This indicates that SMC can act 
as a low-cost-high-reward method for verifying VIS. 


1 Introduction 


We consider the problem of bug detection in Variability Intensive Systems (VIS). 
This category of systems encompasses any system that can be derived into multi- 
ple variants (differing, e.g., in provided functionalities), including software prod- 
uct lines [12] and configurable systems [32]. Compared to traditional (“single”) 
systems, the complexity of bug detection in VIS is increased: bugs can appear 
only in some variants, which requires analysing the peculiarities of each variant. 
Among the number of techniques developed for bug detection, one finds test- 
ing and model checking. Testing [6] executes particular test inputs on the sys- 
tem and checks whether it triggers a bug. Albeit testing remains widely used 
in industry, the rise of concurrency and inherent system complexity has made 
system-level test case generation a hard problem. Also, testing is often limited 
to bounded reachability properties and cannot assess liveness properties. 
Model checking [2] is a formal verification technique which checks that all 
behaviours of the system satisfy specified requirements. These behaviours are 
typically modelled as an automaton, whose each node represents a state of the 


© The Author(s) 2020 
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system (e.g. a valuation of the variables of a program and a location in this pro- 
gram’s execution flow) and where each transition between two states expresses 
that the program can move from one state to the other by executing a sin- 
gle action (e.g. executing the next program statement). Requirements are often 
expressed in temporal logics, e.g. the Linear Temporal Logic (LTL) [31]. 


Such logics capture both safety and liveness properties of system behaviours. 
As an example, consider the LTL formula O(command_sleep > Osystem_sleep). 
command_sleep and system_sleep are logic atoms and represent, respectively, a 
state where the sleep command is input and another state where the system 
enters sleep mode. The symbols O and © means always and eventually, respec- 
tively. Thus, the whole formula expresses that “it is always the case that when 
the sleep command is input, the system eventually enters sleep mode”. 


Contrary to testing, model checking is exhaustive: if a bug exists then the 
checking algorithm outputs a counterexample, i.e. an execution trace of the sys- 
tem that violates the verified property. Exhaustiveness makes model checking 
an appealing solution to obtain strong guarantees that the system works as in- 
tended. It can also nicely complement testing (whose main advantage remains 
to be applied directly on the running system), e.g. by reasoning over liveness 
properties or by serving as oracle in test generation processes [1]. Those bene- 
fits, however, come at the cost of scalability issues, the most prominent being 
the state explosion problem. This term refers to the phenomenon where the state 
space to visit is so huge that an exhaustive search is intractable. As an illustration 
of this, let us remark that the theoretical complexity of the LTL model-checking 
problem is PSPACE-complete [37]. 


Model checking complexity is further exacerbated when it comes to VIS. In- 
deed, in this case, the model-checking problem requires verifying whether all 
the variants satisfy the requirements [11]. This means that, if the VIS comprises 
n variation points (n features in a software product line or n Boolean options 
in a configurable system), the number of different variants to represent and to 
check can reach 2”. This exponential factor adds to the inherent complexity of 
model checking. Thus, checking each variant (or models thereof) separately — 
an approach known as enumerative or product-based [34] — is often intractable. 
To alleviate this, variability-aware models and companion algorithms were pro- 
posed to represent and check efficiently the behaviour of all variants at once. For 
instance, Featured Transition Systems (FTS) [11] are transition systems where 
transitions are labelled with (a symbolic encoding of) the set of variants able to 
exercise this transition. The structure of FTS, if well constructed, allows one to 
capture in a compact manner commonalities between states and transitions of 
several variants. Exploiting that information, family-based algorithms can check 
only once the executions that several variants can execute and explore the state 
space of an individual variant only when it differs from all the others. In spite 
of positive improvements over the enumerative approach, state-space explosion 
remains a major challenge. 


In this work, we propose an alternative technique for state-space exploration 
and bug detection in VIS. We use Statistical Model Checking (SMC) [26] as a 
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trade-off between testing and model checking to verify properties (expressed in 
full LTL) on FTS. The core idea of SMC is to conduct some simulations (i.e. 
sample executions) of the system (or its model) and verify if these executions 
satisfy the property to check. The results are then used together with statistical 
tests to decide whether the system satisfies the property with some degree of 
confidence. Of course, in contrast with an exhaustive approach, a simulation- 
based solution does not guarantee a result with 100% confidence. Still, it is 
possible to bound the probability of making an error. Simulation-based methods 
are known to be far less memory- and time-consuming than exhaustive ones, and 
are sometimes the only viable option. Over the past years, SMC has been used 
to, e.g. assess the absence of errors in various areas from aeronautic to systems 
biology; measure cost average and energy consumption for complex applications 
such as nanosatellites; detect rare bugs in concurrent systems [10, 21, 25]. 
Given an LTL formula and an FTS, our family-based SMC method samples 
executions from all variants at the same time. Doing so, it avoids sampling twice 
(or more) executions that exist in multiple variants. Merging the individual state 
spaces biases the results, though, as it changes the probability distribution of the 
executions. This makes the problem different from previous methods intended 
for single systems (e.g. [20]) and obliges us to revisit the fundamentals of SMC in 
the light of VIS. In particular, we want to characterize the number of execution 
samples required to bound the probability of Type-1 error by a desired degree 
of confidence. We provide a lower bound and an upper bound for this number 
by reducing its computation to particular instances of the coupon problem [4]. 
We implemented our method within ProVeLines [17], a model checker for VIS. 
We provide empirical evidence, based on 3 case studies totalling 59 properties 
to check, that family-based SMC is a viable approach to verify VIS. Our study 
shows that our method manages to find all buggy variants in 41 properties and 
does so up to 16 times faster than state-of-the-art model-checking algorithms 
for VIS [11]. Moreover, our approach can achieve a median bug detection rate 3 
times higher than classical SMC applied to each variant individually. The hardest 
cases arise when the state space of some variant is substantially smaller than the 
other. This leads to a reduced probability to find a bug in those variants. 


2 Background on Model Checking 


In model checking, the behaviour of the system is often represented as a transi- 
tion system (S, A, AP, L) where S is a set of states, A C Sx S is the transition 
relation, AP is a set of atomic propositions? and L : S — 24? labels any state 
with the atomic propositions that the system satisfies when in such a state. 


2.1 Linear Temporal Logic 


LTL is a temporal logic that allows specifying desired properties over all future 
executions of some given system. Given a set AP of atomic propositions, an LTL 


3 Atomic propositions can be seen as basic observable properties of the system state. 
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formula ¢ is formed according to the following grammar: ¢ ::= T | a | 61 A do | 
ad, | Od: | 61Ud2 where ¢; and ¢2 are LTL formulae, a € AP, © is the next 
operator and U is the until operator. We also define O¢ ( “eventually” 6) and 
@ ( “always” ¢) as a shortcut for TUd and =0-7¢, respectively. 

Vardi and Wolper have presented an automata-based approach for checking 
that a system — modelled as a transition system ts — satisfies an LTL formula 
@ [37]. Their approach consists of, first, transforming ¢ into a Biichi automaton 
B_4 whose language is exactly the set of executions that violate ¢, that is, those 
that visit infinitely often a so-called accepting state. Such execution o takes the 
form of a lasso, i.e. € = qo . - - qn With qj = qn for some j and where q; is accepting 
for some i: j <i < n. We name accepting any such lasso whose cycle contains 
an accepting state. 

The second step is to compute the synchronous product of ts and B_4, which 
results in another Biichi automaton B;,g.4. Any accepting lasso in Bysg +4 rep- 
resents an execution of the system that violates ¢. Thus, Vardi and Wolper’s 
algorithm comes down to checking the absence of such accepting lasso in the 
whole state space of Bisa. The size of this state space is O(|ts| x |2!!|) and 
the complexity of this algorithm is PSPACE-complete. 


2.2 Statistical Model Checking 


Originally, SMC was used to compute the probability to satisfy a bounded LTL 
property for stochastic system [39]. The idea was to monitor the properties on 
bounded executions represented by Bernoulli variables and then use Monte Carlo 
to estimate the resulting property. SMC also applies to non-stochastic systems 
by assuming an implicit uniform probability distribution on each state successor. 

Grosu and Smolka [20] lean on this and propose an SMC method to address 
the full LTL model-checking problem. Their sampling algorithm walks randomly 
through the state space of Bysa— until it finds a lasso. They repeat the process 
M times and conclude that the system satisfies the property if and only if none 
of the M lassos is accepting. They also show that, given a confidence ratio ô and 
assuming that the probability p for an execution of the system exceeds an error 
margin €, setting M = E bounds the probability of a Type-1 error (rejecting 
the hypothesis that the system violates the property while it actually violates it) 
by ô. Thus, M can serve as a minimal number of samples to perform. Our work 
extends theirs in order to support VIS instead of single systems. Other work on 
applying SMC to the full LTL logic can be found in [18,38]. 


2.3 Model Checking for VIS 


Applying classical model checking to VIS requires iterating over all variants, 
construct their corresponding automata B,,2.4 and search for accepting lasso 
in each of these. This enumerative method (also named product-based [34]) fails 
to exploit the fact that variants have behaviour in common. 

As an alternative, researchers came up with models able to capture the be- 
haviour of multiple variants and distinguish between the unique and common 
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Fig. 1: An example of FBA with two variants. 


behaviour of those variants [3,8,11]. Among such models, we focus on featured 
transition systems [11] as those can link an execution to the variants able to 
execute it more directly than the alternative formalisms. In a nutshell, FTS ex- 
tend the standard transition system by labelling each transition with a symbolic 
encoding of the set of variants able to exercise this transition. Then, the set of 
variants that can produce an execution 7 is the intersection of all sets of variants 
associated with the transitions in 7. 

To check which variants violate a given LTL formula ¢, one can adapt the 
procedure of Vardi and Wolper and build the synchronous product of the featured 
transition system with 6-4 [11]. This product is similar to the Büchi automaton 
obtained in the single system case, except that its transitions are also labelled 
with a set of variants.4 Then, the buggy variants are those that are able to 
execute the accepting lassos of this automaton. This generalized automaton is 
the fundamental formalism we work on in this paper. 


Definition 1 Let V be a set of variants. A Featured Biichi Automaton (FBA) 
over V is a tuple (Q, A, Qo, A,, O, y) where Q is a set of states, AC QxQ 
is the transition relation, Qo C Q is a set of initial states, A C Q is the set of 
accepting states, © is the whole set of variants, and y : A — 2° associates each 
transition with the set of variants that can execute it. 


Figure 1 shows an FBA with two variants and eight states. State 5 as the 
only accepting state. Both variants can execute the transition from State 3 to 
State 4, whereas only variant vg can move from State 3 to State 6. 

The Biichi automaton corresponding to one particular variant v is derived by 
removing the transitions not executable by v. That is, we remove all transitions 
(q,q') € A such that v Z y(q,q’). The resulting automaton is named the projec- 
tion of the FBA onto v. For example, one obtains the projection of the FBA in 


* Those labels are equal to those found in the corresponding transitions of the featured 
transition system. 
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Figure 1 onto ve by removing the transition from State 3 to State 7 and those 
between State 7 to State 8. 


2.4 Other Related Work 


Recent work has applied SMC in the context of VIS. In [36], the authors pro- 
posed an algebraic language to describe (quantitative) behavioural variability in 
a dynamic manner. While their work shares some similarities with ours, there 
are fundamental differences. First, we seek for guaranteeing the absence of bugs 
in all variants of the family (applying family-based concepts), while they focus 
on dynamic feature interactions (on a product-based basis). The second differ- 
ence is that they consider quantitative bounded properties, while we support the 
entire LTL verification problem by extending the multi-lasso concept of [20, 28]. 

Another related, yet different area is the sampling of VIS variants (e.g. [27, 
30]). Such work considers the problem of sampling uniformly variants in order to 
study their characteristics (e.g. performance [22] and other quality requirements 
[15]) and infers those of the other variants. Recently, Thiim et al. [35] survey 
different strategies for the performance analysis of VIS, including the sampling 
of variants and family-based test generation, which is based on the same idea of 
executing test cases common to multiple variants. Contrary to us, such works 
do not consider temporal/behavioural properties and most of them perform the 
sampling based on a static representation of the variant space (i.e. a feature 
model [23]). An interesting direction for future work is to combine our family- 
based SMC with sampling techniques to check only representative variants of 
the family. 


3 Family-Based Statistical Model Checking 


The purpose of SMC is to reduce the verification effort (when visiting the state 
space of the system model) by sampling a given number of executions (i.e. lassos). 
This gain in efficiency, however, comes at the risk of Type-1 errors. Indeed, while 
the discovery of a counterexample leads with certainty to the conclusion that the 
variants able to execute it violate the property @¢, the fact that the sampling did 
not find a counterexample for some variant v does not entail a 100% guarantee 
that v satisfies ¢. The more lassos we sample, the more confident we can get that 
the variants without counterexamples satisfy ¢. Thus, designing a family-based 
SMC method involves answering three questions: (1) how to sample executions; 
(2) how to choose a suitable number of executions; (3) what is the associated 
probability of Type-1 error. 


3.1 Random Sampling in Featured Biichi Automata 


One can sample a lasso in an FBA by randomly walking through its state space, 
starting from a randomly-chosen initial state and ending as soon as a cycle is 
found. A particular restriction is that this lasso should be executable by at least 
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Input: fba = (Q, A, Qo, A, O, 7) 
Output: (o, Oc, accept) where ø is a lasso of fba and Os is the set of the 
variants able to execute o and accept is true iff o is accepting. 
1 go < pick from Qo with probability Tool’ 
2 q+ qo; o + qo; Oo + O; depth + 0; a + 0; 
3 while hash(q) =L do 


4 depth < depth + 1; 

5 hash(q) < depth; 

6 if q € Athen 

7 | a + depth; 

8 end 

9 | Succo + {d € Q|lq, d) © AN (yla, d) N Oo) FO}; 
10 q' + pick from Succo with probability EEA 
11 atod; 
12 | Os & Oo NYl, q); 
13 | qed; 
14 end 


15 return (o, Os, hash(q) < a) 


Algorithm 1: Random Lasso Sampling 


one variant; otherwise, we would sample a behaviour that does not actually exist. 
The set of variants able to execute a given lasso are those that can execute all 
its transitions, i.e. the intersection of all y(q,q') met along the transitions of this 
lasso. More generally, we define the lasso sample space of an FBA as follows. 


Definition 2 Let fba = (Q, A, Qo, A, O, y) be a featured Btichi automaton. 
The lasso sample space L of fba is the set of executions o = qo ... qn such that 
qo € Qo, (di, Gi+1) E€ A foralO<i<n-l1, (No<i<n-1 (di, qi+1)) # 0, qj = dn 
for some0<j<n-landa4b>q£4@ for all0 <a,b<n-—1. Moreover, 
a is said to be an accepting lasso if dda E€ A for some j <a<n. 


Algorithm 1 formalizes the sampling of lassos in a deadlock-free FBA.° After 
randomly picking an initial state (Line 1), we walk through the state space by 
randomly choosing, at each iteration, a successor state among those available 
(Line 7-18). Throughout the search, we maintain the set of variants O, that 
can execute o so far (Line 16). Then, we use this set as a filter when selecting 
successor states, so as to make sure that o remains executable by at least one 
variant. At Line 13, Succ, is the set of successors q’ of q (last state of o) that can 
be reached. We stop the search as soon as we reach a state that was previously 
visited (Line 7). If this state was visited before the last accepting state, it means 
that the sampled lasso is accepting (Line 19). 


5 We assume that no variant may remain stuck in a state without outgoing transition 
that this variant can execute. Should this happen, we assume that the variant self- 
loops in the state wherein it is stuck, yielding an immediate lasso. 
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A motivated criticism [28] of the use of random walk to sample lasso is that 
shorter lassos receive a higher probability to be sampled. To counterbalance 
this, we implemented a heuristic named muilti-lasso [20]. It consists of ignoring 
backward transitions that do not lead to an accepting lasso if there are still 
forward transitions to explore. This is achieved by modifying Line 13 such that 
backward transitions leading to a non-accepting lasso are not considered in the 
successor set. 

Assuming a uniform selection of outgoing transitions from each state, one 
can compute the probability that a random walk samples any given lasso from 
the sample space. 


Definition 3 The probability P(o) of a lasso o = qo . . . qn is inductively defined 
as follows: P{qo| = |Qo|~* and Plqo...q;] = Plao - - - qj—1] x |Sucegg...q;_1|7*- 


In the absence of deadlock, (L, P(£), P) defines a probability space. Proba- 
bility spaces on infinite executions are by no means a trivial construction (see 
e.g. [9]). Nevertheless, the proof of this proposition is similar to its counterpart 
in Biichi automata [20] and is therefore omitted. It derives from the observation 
that the lasso sample space is composed of non-subsuming finite prefixes of all 
infinite paths of the automaton. 

Let us consider an example. In the FBA from Figure 1, there are two non- 
accepting lassos (1; = (1,2,1) and lz = (1,3,7,8,7)) and two accepting lassos 
(l3 = (1,3,4,5,3) and l4 = (1,3,6,5,3)). Both variants can execute lassos l3, 
while only vı can execute lə and only v2 can execute lı and l4. The probability 
of sampling lı is 3, whereas P[l2] = P[l3] = P|l4] = %. Thus, the probability of 
sampling a counterexample executable by v2 is i, whereas it is only z for v. 

Next, we characterize the relationship between this probability space and any 
individual variant v. Let L, be the set of lassos executable by v. Since L, C L, 
the probability p, to sample such a lasso is 7, ez, P(a). Note that p, can 
be different from the probability », of sampling an accepting lasso from the 
automaton modelling the behaviour of v only (i.e. the projection of the FBA 
onto v). This is because, in the FBA, the probability of selecting an outgoing 
transition from a given state is assigned uniformly regardless of the number of 
variants able to execute that transition. This balance-breaking effect increases 
more as the variants have different numbers of unique executions. 

Let o = qo . . . qn be a lasso in Ly. Then P, (o) is inductively defined as follows: 
P,[¢o] = Piao] and P,[qo---q;] = Polao ---aj-1] x K (aj-1:0) € Ay : q € QHT! 
where A, = {(q,q') € A: v € y(q,q')}. In our example, P,, [l3] = 4, as opposed 
to Plls] = E, This implies that it is more likely to sample an accepting lasso 
executable by vı from its projection in one trial than it is from the whole FBA 
in two trials. This illustrates the case where merging the state spaces of the 
variants can have a negative impact on the capability to find bugs specific to 
one variant. 

Thus, sampling lassos from the FBA allows finding one counterexample exe- 
cutable by multiple products but it introduces a bias. Overall, it tends to decrease 
the probability of sampling lassos from variants that have a smaller state space. 
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This can impact the results and parameter choices of SMC, like the number of 
samples required to get confident results and the associated Type-1 error. 


3.2 Hypothesis Testing 


Remember that addressing the model checking problem for VIS requires to find 
a counterexample for every buggy variant v. Thus, one must sample a number 
M of lassos such that one gets an accepting lasso for each such buggy variant 
with a confidence ratio 6. Let fba be a featured Biichi automaton, v be a variant 
and py = )>a € LY P(o) where LY is the set of accepting lasso executable by v. 
Let Z, denote a Bernoulli random variable such that Z, = 1 with probability 
Pv and Z, = 0 with probability q, = 1 — p,. Now, let X, denote the geometric 
random variable with parameter p, that encodes the number of independent 
samples required until Z, = 1. For a set of variants V = {v,...vjy|}, we have 
that Xv, ...Xvy,,, are not independent since one may sample a lasso executable 
by more than one variant. 

We define X = max;=1..|v] Xv,- We aim to find a number of sample M such 
that P[X < M] > 1-6 for a confidence ratio 6. This is analogous to the 
coupon collector’s problem [4], which asks how many boxes are needed to collect 
one instance of every coupon placed randomly in the boxes. It differs from the 
standard formulation in that the probability of occurrence of coupons are neither 
independent nor uniform, and a single box can contain 0 to |V| coupons. Even 
for simpler instances of the coupon problem, computing P|X < M] analytically 
is known to be hard [33]. Thus, existing solutions rather characterise a lower 
bound and an upper bound. We follow this approach as well. 


3.3. Lower Bound (Minimum Number of Samples) 


To compute a lower bound for the number of samples to draw, we transform the 
family-based SMC problem to a simpler form (in terms of verification effort). 
We divide our developments into two parts. First, we show that assigning equal 
probabilities p,, to every variant v; (obtained by averaging the original probabil- 
ity values) reduces the number M of required samples. As a second step, we show 
that assuming that all variants share all their executions also reduces M. Doing 
so, we reduce the family-based SMC problem to its single-system counterpart, 
which allows us to obtain the desired lower bound. 


Averaged probabilities. Let Pavg = cal Dev=1.1V] Py and Xeven be the coun- 
terpart of X where all probabilities pẹ, have been replaced by Pavg- 


Lemma 4 For any number N, it holds that P|X even < N] > P[|X < N]. 


Intuitively, the value of X depends mainly on the variants whose accepting lassos 
are rarer. By averaging the probability of sampling accepting lassos, we raise the 
likelihood to get those rarer lassos and, thus, the number of samples required 
to get an accepting lasso for all variants. Shioda [33] proves a similar result 
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for the coupon collector problem. He does so by showing that the vector Doyen 
majorizes p = {py, .-- Py, } and that the ccdf® of X is a Schur-concave function 
of the sampling probabilities. Even though our case is more general than the non- 
uniform coupon collector’s problem, the result of Lemma 4 still holds. Indeed, we 
observe that the theoretical proof of [33] (a) does not assume the independence 
of the random variables Z,,; (b) still applies to the dependent case; and (c) 
supports the case where the sum of the probability values p,, is less than one. 


Maximized commonalities. Next, let Xan be the particular case of Xeven 
where all accepting lassos are executable by all variants and are sampled with 
probability payg. Thus, the number of samples to find an accepting lasso for 
every variant is reduced to the number of samples required to find any accepting 
lasso. 


Lemma 5 It holds that P|Xau < N] > P|Xeven < N]. 


Moreover, let us note that Xan is a geometric random variable with parameter 
Pavg- This reduces our problem to sampling an accepting lasso in a classical Biichi 
automaton and allows us to reuse the results of Grosu and Smolka [20]. 


Lemma 6 For a confidence ratio 6 and an error margin €, it holds that 
Pavg > e > PiXau < M] > P| Xa < N| =1-6 


ee) _ In(6) 
where M = pag and N= paso: 


This leads us to the central result of this section. 


Theorem 7 Assuming that pavg > €avg for a given error margin €aug, a lower 


bound for the number of samples required to find an accepting lasso for each 


buggy variant is M = mis with a Type-1 error bounded by ô. 


3.4 Upper Bound (Maximum Number of Samples) 


We follow a similar two-step process to characterise an upper bound for M. In the 
first step, we replace the probabilities p,, of every variant by their minimum. 
In the second step, we alter the model so that the variants have no common 
behaviour. Then we show that, given a desired degree of confidence, the obtained 
model requires a higher number of samples than the original one. 


Minimum probability. Let pmin = min,=1.\v| Pv and Xmin be the coun- 
terpart of X where all probabilities p,, have been replaced by pmin. The ccdf 
of X being a decreasing function of the sampling probabilities, we have that 


6 ccdf stands for complementary cumulative distribution function 
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No common counterexamples. Let {(Xindep)», } be a set of independent geo- 
metric random variables with parameters pin and let Xindep = Max(Xindep)»;- 
Xindep actually encodes the number of samples required to get a counterexample 
for all buggy variants when those have no common counterexamples. We have 
that P[Xindep < N] < P[Xmin < N], since the number of samples to perform 
cannot be reduced by sampling a counterexample executable by multiple vari- 
ants. Now, let us note that Xjndep is an instance of the uniform coupon problem 
with |V| coupons to collect. A lower bound for P[Xindep < M] is known to 
be 1—|V| x (1—pmin)™ [33]. Assuming pmin greater than some error margin 
Emin, we have P[|Xindep < M] > 1- |V| x (1 — Emin)” . Setting a confidence 
ratio 6, we want to find a M such that P[|Xindep < M] > 1 — ô. By solving 


1—|V|(1 — emin)” = 1 — ô, we obtain M = mo, which we can use as 


the upper bound for the number of samples to perform. 


Theorem 8 Assuming that Pmin > €min for a given error margin Emin, an 


upper bound for the number of samples required to find an accepting lasso for 
In(6)—In(|V]) 


FAC enin) with a Type-1 error is bounded by 6. 


each buggy variant is M = 


4 Empirical Study 


4.1 Objectives and Methodology 


One can regard SMC as a means of speeding up verification while risking miss- 
ing counterexamples. Our first question studies this trade-off and analyses the 
empirical Type-1 error rate. More precisely, we compute the detection rate of 
our family-based SMC method, expressed as the number of buggy variants that 
it detects over the total number of buggy variants. 


RQ1 What is the empirical buggy variant detection rate of family-based SMC? 


We compute the detection rate for different numbers M of samples lying between 
the lower and upper bounds as characterised in Section 3. To get the ground 
truth (i.e. the true set of all buggy variants), we execute the exhaustive LTL 
model checking algorithms for FTS designed by Classen et al. [11]. For the lower 
bound, we assume that the average probability to sample an accepting lasso for 
any variant is higher than €avg = 0.01. Setting a confidence ratio ô = 0.05 yields 


aes = 298. We round up and set M = 300 as our lower bound. For the higher 


bound, we assume that the minimum probability to sample a counterexample in 
a buggy variant is higher than €min = 3.1074 and also set 6 = 0.05. For a model 


with 256 variants’, this yields M = Na = 18478. For convenience, 


we round it up to 19,200 = 300 - 2°. In the end, we successively set M to 
300, 600,..., 19200 and observe the detection rates. 

Next, we investigate a complementary scenario where the engineer has a 
limited budget of samples to check. We study the smallest budget required by 


7T 956 is the maximum number of variants in our case studies 
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SMC to detect all buggy variants (in the cases where it can indeed detect all of 
them) and what is the incurred computation resources compared to an exhaus- 
tive search of the state space. Thus, our second question is: 


RQ2 How much efficient is SMC with a minimal sample budget compared to an 
exhaustive search? 


Finally, we compare family-based SMC with the alternative of sampling in 
each variant’s state space separately. We name this alternative method enumer- 
ative SMC. Hence, our last research question is: 


RQ3 How does family-based SMC compares with enumerative SMC? 


As before, we compare the two techniques w.r.t. detection rate. We set M to 
the same values as in RQ1. In enumerative SMC, this means that each variant 
receives a budget of samples of v1 where M is the number of samples used in 
family-based SMC and V is the set of variants. 


4.2 Experimental Setup 


Implementation. We implemented our SMC algorithms (family-based and enumerative- 
based) in a prototype tool. The tool takes as input an FTS, an LTL formula and 
a sample budget. Then it performs SMC until all samples are checked or until 
all variants are found to violate the formula. To compare with the exhaustive 
search we use ProVeLines [17], a state-of-the-art model checker for VIS. 


Dataset. We consider three systems that were used in past research to evaluate 
VIS model checking algorithms [11,14,16]. Table 1 summarizes the characteristics 
of our case studies and their related properties. The first system is a minepump 
system [11,24] with 128 variants. The underlying FTS comprises 250,561 states, 
while the state space of all variants taken individually reaches 889,124 states. 
The second model is an elevator model inspired by Plath and Ryan [29]. It is 
composed of eight configuration options, which can be combined into 256 differ- 
ent variants, and its FTS has 58,945,690 states to explore. The third and last is 
a case study inspired by the CCSDS File Delivery Protocol (CFDP) [13], a real- 
world configurable spacecraft communication protocol [5]. The FTS modelling 
the protocol consists of 1,732,536 states to explore and 56 variants (individually 
totalling 2,890,399 states). We discarded the properties that are satisfied by all 
variants. Those are: Minepump #17, #33, #40; Elevator #13, CFDP #5. In- 
deed, these properties are not relevant for RQ1 and RQ3 since SMC is trivially 
correct in such cases. As for RQ2, any small sample budget would return correct 
results while being more efficient than the exhaustive search. This leaves us with 
59 properties. 


Infrastructure and repetitions. We run our experiments on a MacBook Pro 2018 
with a 2.9 GHz Core-i7 processor and macOS 10.14.5. To account for random 
variations in the sampling, we execute 100 runs of each experiment and compute 
the average detection rates for each property. 
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Table 1: Models and LTL formulae used in our experiments. 


Minepump (250,561 FTS states, 128 valid variants) 


(OO (stateReady ^A highWater ^ userStart)) 
-(O0stateReady) 


-(OOstateRunning) 


(OO stateStopped) 
-(O0stateMethanestop) 


-=(OOstateLowstop) 


-3(OOreadCommand) 
-(OOreadAlarm) 


-(OreadLevel) 


J 


OreadCommand) ^ (O9readAlarm) A (QOreadLevel)) 
=(O0pumpOn) 


=(O09-pumpOn) 


1 


OpumpOn) A (AO-=pumpOn)) 


=(O9methane) 


-=(O-methane) 
-((OOmethane) A (AO7>methane)) 


(apumpOn V stateRunning) 


(methane = (QstateMethanestop)) 
(methane = 7(QstateMethanestop)) 


(pumpOn V >methane) 


((pumpOn A methane) = O7>pumpOn) 
((AOreadCommand) A (O)readAlarm) A (OOreadLevel)) ((pumpOn A methane) = O7>pumpOn) 


=~ 


=00 (pumpOn ^ methane 


((AOreadCommand) A (O)readAlarm) A (OOreadLevel)) = ~OO(pumpOn A methane) 
((apumpOn ^ methane A O>methane) => ((apumpOn)U-methane)) 


((highW ater A smethane) = QpumpOn) 


-(O(highW ater A smethane)) 
((AOreadCommand) A (O)readAlarm) A (OOreadLevel)) = (O((highW ater A smethane) = “pumpOn)) 


O((highW ater A smethane) = 7QpumpOn) 


300 (sapumpOn A highW ater) 


((QOreadCommand) A (OdreadAlarm) A (OOreadLevel)) = (=00(=pumpOn A highW ater)) 


=0O(-pumpOn A >methane ^ highW ater) 
((AQOreadCommand) A (O)readAlarm) A (OOreadLevel)) = (=O0(-pumpOn A smethane ^ highWater)) 


((pumpOn A highWater ^ OlowWater) = (pumpOnUlowW ater)) 


30(pumpOn A highW ater ^A OlowW ater) 
(lowW ater > (Q>pumpOn)) 


((AOreadCommand) A (O)readAlarm) A (OOreadLevel)) = (A(lowW ater = (Q>pumpOn))) 


300 (pumpOn A lowW ater) 
((AOreadCommand) A (O)readAlarm) A (QOreadLevel)) = (-O0(pumpOn A lowW ater)) 


((apumpOn A lowWater ^A OhighWater) = ((apumpOn)UhighW ater)) 


30(apumpOn ^ lowWater A QhighW ater) 


Elevator (58,945,690 FTS states, 256 valid variants) 


aDOprogress 


30 f0 v ~DOf1 v -09 f2 v -O0 f3 


aH p0at0 V =D) p0atl V ~DOp0at2 V ~DOp0at3 


1 
2 
3 
4 


(fb2 = (Of2)) 
Oprogress = (A(fb2 = (0 f2))) 


progress = (O( fb2 = (O(f2 A dopen)))) 
progress = (aOOf2) 


(progress V waiting) > (790 f2) 


(progress V waiting) > (~OOf0) 
O((cbO V cbl V cb2 V cb3) A a(p0in V plin) A dclosed) 
progress = (7Ldclosed) 


al 


progress = (OO (p0to3 A dclosed)) 
progress = (> )Ldopen) 


(progress V waiting) = (~90dopen) 


((AO(progress V waiting)) A (OO(fb0 V fb1 v fb2 v fb3))) = (-OUdopen) 


3120(p0in A plin A dclosed) 


18 


=O (p0in A delosed) 
progress = (~OU(p0in A dclosed)) 


CFDP (1,801,581 FTS states, 56 valid variants) 


#1 
#2 
#3 
#4 

5 


0 fileReceived 

(Qeof Received) >  fileReceived 

((Qeof Received) A (QnakReceived)) = Q fileReceived 
((Qeof Received) A (Onak Received)) => Q fileReceived 
(finSend = fileReceived) 
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(a) Minepump (family-based SMC) (b) Elevator (family-based SMC) 
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LA 


300 600 1200 2400 4800 9600 19200 


(c) CFDP (family-based SMC) 


Fig. 2: Detection rate of the buggy variants achieved by our SMC method, in the 
three case studies and using different sample sizes. In each figure, the x-axis is 
the number of samples. 


5 Results 


5.1 RQ1: Detection Rate 


Figure 2 shows as boxplots, for each case study and over all checked properties, 
the percentage of buggy variants for which family-based SMC found a counterex- 
ample. We provide those boxplots for different number M of samples. 

In the case of Minepump and Elevator, the median detection percentage is 
100% starting from M = 1200 and M = 600, respectively. Further increasing the 
number of samples raises the 0.25 percentile. In Minepump and for M = 1200, 
there are 18/41 properties for which SMC could not detect all buggy variants. 
Increasing M improves significantly the percentage of buggy variants detected by 
SMC for all these properties, although there remain undetected variants in 15 
of them even with M = 19,200. This illustrates that our assumption regarding 
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Pmin Was inappropriate for those properties: counterexamples are rarer than 
we imagined. The elevator study yields even better results: at M = 600, SMC 
detects all buggy variants for 10/18 properties; this number becomes 14/18 at 
M = 2,400 and 17/18 at M = 9,600. As for the remaining property, SMC with 
M = 19,200 detects 50% of the variants on average and we observe that this 
percentage consistently increases as we increase M. 

The results for CFDP are mixed: while the median percentage goes beyond 
80% as soon as M = 1, 200, it tends to saturate when increasing the number of 
samples. The 0.25 percentile still increases but also seems to reach an asymp- 
totic behaviour in the trials with the highest M. A detailed look at the results 
reveals that for M > 1,200, SMC cannot identify all buggy variants for only two 
properties: #3 (9 buggy variants) and #4 (4 buggy variants). At M = 19, 200, 
SMC detects 5.43 and 3.14 buggy variants for those two properties, respectively. 
Further doubling M raises these numbers to 6.36 and 3.26. This indicates that 
the non-detected variants have few counterexamples, which are rare due to the 
tinier state space of those variants. The computation resources required by SMC 
to find such rare counterexamples with high confidence are higher than model- 
checking the undetected variants thoroughly. An alternative would be to direct 
SMC towards rare executions, leaning on techniques such as [10, 21]. 


SMC can detect all buggy variants for 41 properties out of 59. For the re- 
maining properties, however, SMC was unable to find the rare counterex- 


amples of some buggy variants. This calls for new dedicated heuristics to 
sample those rare executions. 


5.2 RQ2: Efficiency 


Next, we check how much execution time SMC can spare compared to the ex- 
haustive search. Results are shown in Table 2. Overall, we see that SMC holds 
the potential to greatly accelerate the discovery of all buggy variants, achieving 
a total speedup of 526%, 1891% and 356% for Minepump, Elevator and CFDP, 
respectively. For more than half of the properties, the smallest number of sam- 
ples we tried (i.e. 300) was sufficient for a thorough detection. Those properties 
are actually satisfied by all variants. The fact that SMC requires such a small 
number of samples means that the same bug lies in all the variants (as opposed to 
each variant violating the property in its own way). On the contrary, Minepump 
property #31 is also violated by all variants but requires a much higher sample 
number, which illustrates the presence of variant-specific bugs. 

Interestingly, the benefits of SMC are higher in the Elevator case (the largest 
of the three models), achieving speedups of up to 16,575%. A likely explana- 
tion is that the execution paths of the Elevator model share many similarities, 
which means that a single bug can lead to multiple failed executions. By sam- 
pling randomly, SMC avoids exploring thoroughly a part of the state space that 
contains no bug and, instead, increases the likelihood to move to interesting 
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Table 2: Least numbers of samples (in our experiments) that allowed detecting all 
buggy variants and corresponding execution time. Full refers to an exhaustive 
search of the search space. Only properties that are violated by at least one 
variant and for which SMC found all buggy variants are shown. 


SMC Full 
Property|# Samples|/# States Time) # States Time/Speedup 
Minepump #1 600 25332 0.18 92469 1.33 739% 
Minepump #2 300 12553 0.10 24908 1.06 1060% 
Minepump #4 300 2383 0.03 103933  3.10| 10333% 
Minepump #5 1200 48714 0.32 76040 1.03 322% 
Minepump #7 300 2469 0.03 18482 0.21 700% 
Minepump #8 300 2757 0.03 4646 0.05 167% 
Minepump #9 300 2758 0.03 8263 0.08 267% 
Minepump #10 600 15191 0.11 55936 0.58 527% 
Minepump #12 300 2356 0.03 811 0.02 67% 
Minepump #14 300 2915 0.04 989 0.02 50% 
Minepump #15 300 2389 0.03 2673 0.05 167% 
Minepump #16 300 4102 0.04 1917 0.03 75% 
Minepump #18 300 2604 0.03 125 0.01 33% 
Minepump #19 600 25027 0.18 143540 2.69 1494% 
Minepump #20 300 3864 0.03 40 0.01 33% 
Minepump #25 2400 67620 0.50 346935 6.12 1224% 
Minepump #26 300 2708 0.03 4382 0.05 167% 
Minepump #27 300 2450 0.03 3702 0.04 133% 
Minepump #28 2400 58382 0.43 99780 1.28 298% 
Minepump #30 300 300 0.03 3648 0.05 167% 
Minepump #31 9600 165802 1.29 61185 1.03 80% 
Minepump #32 300 2684 0.03 4110 0.05 167% 
Minepump #41 300 5732 0.05 3886 0.04 80% 
Total 461092 3.60| 1062400 18.93 526% 
Elevator #1 300 4371 0.03 105883 0.52 1733% 
Elevator #2 600 226813 1.14 437252 2.48 218% 
Elevator #3 4800| 1736781 7.67) 14822853 103.22 1346% 
Elevator #4 300 4403 0.04 1194568 6.63} 16575% 
Elevator #5 300 7719 0.05 1305428 7.76| 15520% 
Elevator #6 300 7061 0.05 1202204 6.89} 13780% 
Elevator #7 600 25021 0.12 732684 4.33 3608% 
Elevator #8 600 26120 0.13 204934 1.19 915% 
Elevator #9 300 3142 0.03 39086 0.28 933% 
Elevator #11 300 3278 0.03 91 0.02 67% 
Elevator #12 9600| 1502419 6.53 1954924 11.12 170% 
Elevator #14 2400 141753 0.61 7889584 52.88 8669% 
Elevator #15 2400 142405 0.69 7889753 57.64 8354% 
Elevator #16 2400 955206 4.02| 28551923 182.25 4534% 
Elevator #17 1200 100755 0.38 516230 3.53 929% 
Elevator #18 4800 510145 1.94 486694 3.00 155% 
Total 5397392 23.46| 67334091 443.74| 1891% 
CFDP #1 300 50206 0.20 87937 1.71 855% 
CFDP #2 1200 117897 0.52 102842 0.85 163% 
Total 168103 0.72/190779.00 2.56 356% 
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T P 1 
o 
300 600 1200 2400 4800 9600 19200 300 600 1200 2400 4800 9800 19200 


(a) Minepump (enumerative SMC) (b) Elevator (enumerative SMC) 


(c) CFDP (enumerative SMC) 
Fig. 3: Detection rate of the buggy variants achieved by classical SMC applied 


variant by variant, in the three case studies and using different sample sizes. In 
each figure, the x-axis is the number of samples. 


(likely-buggy) parts. A striking example is property #16 (satisfied by half of the 
variants), where SMC reduces the verification time from 3 minutes to 4 seconds. 


Where SMC can detect all buggy variants, it can do so with more ef- 


ficiency compared to exhaustive search, for 33/41 properties, achieving 
speedups of multiple orders of magnitude. 


5.3 RQ3: Family-based SMC versus Enumerative SMC 


Figure 3 shows the detection rate achieved the enumerative SMC for the three 
case studies and different numbers of samples, while the results of the family- 
based SMC were shown in Figure 2. In the Minepump and Elevator cases, enu- 
merative SMC achieves a lower detection rate than family-based SMC. In both 
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cases, a Student t-test with a = 0.05 rejects, with statistical significance, the 
hypothesis that the two SMC methods yield no difference in error rate. One can 
observe, for instance, that, with 600 samples, enumerative SMC achieves a me- 
dian detection rate of 31.13%, while family-based SMC achieved 99.86%. This 
tends to validate our hypothesis that family-based SMC is more effective as the 
variants share more executions. Indeed, on average, one state of the Minepump 
is shared by 3.55 variants. 

In the case of CFDP, however, enumerative SMC performs systematically 
better (up to 13.95% more). Still, the difference in median detection rate tends 
to disappear as more executions are sampled. Nevertheless, CFDP illustrates the 
main drawback of family-based SMC: it can overlook counterexamples in vari- 
ants with fewer behaviours. In such cases, enumerative SMC might complement 
family-based SMC by sampling from the state space of specific variants. 


Family-based SMC can detect significantly more buggy variants than enu- 
merative SMC, especially when few lassos are sampled. Yet, enumerative 


SMC remains useful for variants that have a tiny state space compared 
to the others and can, thus, complement the family-based method. 


6 Conclusion 


We proposed a new simulation-based approach for finding bugs in VIS. It applies 
statistical model checking to FTS, an extension of transition systems designed to 
model concisely multiple VIS variants. Given an LTL formula, our method results 
in either collecting counterexamples for multiple variants at once or proving the 
absence of bugs. The algorithm always converges, up to some confidence error 
which we quantify on the FTS structure by relying on results for the coupon 
collector problem. After implementing the approach within a state-of-the-art 
tool, we study empirically its benefits and drawbacks. It turns out that a small 
number of samples is often sufficient to detect all variants, outperforming an 
exhaustive search by an order of magnitude. On the downside, we were unable to 
find counterexamples for some faulty variants and properties. This calls for future 
research, exploiting techniques to guide the simulation towards rare bugs/events 
[7, 10,21] or towards uncovered variants relying, e.g., on distance-based sampling 
[22] or light-weight scheduling sampling [19]. Nevertheless, the positive outcome 
of our study is to show that SMC can act as a low-cost-high-reward alternative 
to exhaustive verification, which can provide thorough results in a majority of 
cases. 
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Abstract. Triple Graph Grammars (TGGs) are a declarative and rule- 
based approach to bidirectional model transformation. The key feature 
of TGGs is the automatic derivation of various operations such as unidi- 
rectional transformation, model synchronisation, and consistency check- 
ing. Application conditions can be used to increase the expressiveness of 
TGGs by guaranteeing schema compliance, i.e., that domain constraints 
are respected by the TGG. In recent years, a series of new TGG-based 
operations has been introduced leveraging Integer Linear Programming 
(ILP) solvers to flexible consistency maintenance even in cases where no 
strict solution exists. Schema compliance is not guaranteed, however, as 
application conditions from the original TGG cannot be directly trans- 
ferred to these ILP-based operations. In this paper, we extend ILP-based 
TGG operations so as to guarantee schema compliance. We implement 
and evaluate the practical feasibility of our approach. 


Keywords: Application conditions, Triple graph grammars, Integer lin- 
ear programming 


1 Introduction 


In the context of Model-Driven Engineering (MDE), software systems are rep- 
resented as a collection of different models. Often several semantically related 
models are involved and therefore have to be kept consistent to each other. The 
process of maintaining consistency among multiple models is called consistency 
management and involves various operations including (unidirectional) trans- 
formation, synchronisation, and consistency checking. Practical applications of 
consistency checking occur in the industry automation domain, where multiple 
domain-specific languages (DSLs) are used to describe complex systems a. 
Triple Graph Grammars (TGGs) are a declarative rule-based approach to 
specifying a bidirectional consistency relation between two modelling languages. 
The main advantage of TGGs is the possibility to derive multiple consistency 
management operations from the same formal specification. In their roadmap for 
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future research on TGGs [2], Anjorin et al. name the expressiveness of the TGG 
language in use as one research dimension. One way of increasing the expres- 
siveness [25] of TGGs is to ensure the satisfaction of certain constraints, such as 
multiplicities with lower and upper bounds, which are typically posed by each 
domain and should be respected by consistency maintainers. Using terminology 
from Ehrig et. al [9], so called graph constraints consist of a premise (if), and a 
set of conclusions (then). They are powerful enough to forbid certain situations 
(negative constraints), demand certain conditions (positive constraints), and en- 
force implications. One possible approach to handling constraints in the context 
of TGGs is the use of application conditions (ACs) to restrict the applicability 
of rules. The subset of ACs supported for operationalised TGGs is, however, still 
quite restricted. All approaches we are aware of only handle a subset of Negative 
Application Conditions (NACs) and mostly focus on model transformation and 
synchronisation rather than consistency checking. 


Recent work [17,18,20,24] has introduced TGG operations based on Integer 
Linear Programming (ILP). Such operations are advantageous because they im- 
plement a flexible and generic strategy for multiple consistency management 
operations, while still providing acceptable scalability for growing model sizes. 
Flexibility here means that the consistency management operations are able to 
handle cases where no strict solution exists by providing “optimal” partial re- 
sults. Graph constraints, however, have not yet been integrated in this hybrid 
ILP-TGG framework and only basic TGG language features [25] are currently 
supported. We extend this line of work by the notion of schema compliance for 
TGGs, i.e., that all derived operations respect a set of constraints, as introduced 
by Anjorin et al. [3]. Instead of trying to integrate ACs into TGG rules, we 
propose to handle domain constraints directly in the ILP-based operations, thus 
achieving schema compliance in this manner. By directly encoding graph con- 
straints as ILP constraints, we are able to handle a larger class of constraints 
than in previous work on schema compliance [3]. We apply our approach to con- 
sistency checking with given correspondence links: a basic operation that must 
be both flexible and efficient as it is often used as a “cheap” check in order to 
avoid unnecessary work and ensure hippocraticness [6]. An extension to other 
operations such as unidirectional transformation is straightforward and sketched 
at the end of this paper. Our approach can be regarded as a step towards toler- 
ant consistency management, as the largest consistent sub-triple is computed in 
case of inconsistent input models. In this case, checking all domain constraints 
in advance is not helpful as the user is only informed about the violation of 
constraints and is not provided with a partial but optimal result. 


The rest of the paper is organised as follows: Section 2 introduces a running 
example, which is used to explain the main ideas on an intuitive level in Sect. 3. 
Our contribution is compared with related work in Sect. 4. Basic definitions are 
provided in Sect. 5, and used to express the formal concepts in Sect. 6. A reference 
implementation together with an experimental evaluation is described in Sect. 7, 
before discussing extensions towards other operations in Sect. 8. Finally, Sect. 9 
concludes the paper and provides some directions for future work. 
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2 Running Example 


To illustrate our approach, a consistency rela- 
tion between simplified data models of the so- 
cial networks Facebook and Instagram is used 
as a running example. The respective meta- 
models are depicted in Fig. 1. A Facebook- 
Network consists of multiple FacebookUsers, 


FNetwork INetwork 


wen] 0.7 
0..* 


uzu > lUser 


E 


follows 


who can share Friendships with each other. Fetsndehip 
Similarly, an InstagramNetwork is made up 
of arbitrarily many InstagramUsers. In con- Fig. 1. Triple of Metamodels 


trast to the Facebook metamodel, the social 

interaction is not expressed via Friendship nodes but by a follows relation 
between InstagramUsers. To complete the triple, a correspondence metamodel 
connects the network and user classes of the two metamodels via correspondence 
types, depicted as hexagons. In the following diagrams, the prefixes Facebook 
and Instagram are abbreviated with F and I, respectively. A triple graph typed 
according to Fig. 1 is consistent if (1) the correspondence links form a bijec- 
tion between all networks and users of the two networks, and (2) the following 
additional graph constraints are satisfied: 

— We forbid two or more Friendship nodes connecting the same two Facebook- 
Users as depicted in Fig. 2. This is denoted as a negative constraint. 

— There should be a Friendship between two FacebookUsers if the corre- 
sponding InstagramUsers follow each other. This means if the premise that 
two InstagramUsers follow each other holds, the conclusion that there is 
a corresponding Friendship on Facebook should also hold. The combina- 
tion of premise and (possibly multiple) conclusions is denoted as positive 
constraint (as depicted in Fig. 3). 


Forbid Conclusion Premise 
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: FUser :U2U : lUser 
friends : Friendship follows 
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4 : : 
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friends 


Fig. 2. NoDoubleFriendship Fig. 3. EnforceFriendship 


3 Main Ideas 


In this section, we demonstrate our approach by formalising the consistency 
relation from the running example as a TGG and deriving a consistency checker. 
The novelty of our approach is that we are able to guarantee schema compliance, 
i.e., that all additional graph constraints (two from the running example) are 
respected by the consistency checker. 
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The consistency relation can be defined by four TGG rules depicted in 
Fig. 4, 5, 6, and 7. Nodes and edges required as context (i.e., they have to 
be matched to apply the rule) are depicted in black, while elements created 
by the rule are depicted in green and are annotated with a ++-markup. Ac- 
cordingly, the rule NetworkToNetwork creates a FacebookNetwork and a corre- 
sponding InstagramNetwork, whereas UserToUser creates corresponding users, 
requiring corresponding networks as context. The other two rules add relation- 
ships between two users in the two social networks. RequestFriendship cre- 
ates a follows edge in the Instagram model, while the Facebook model re- 
mains unchanged. A follows edge in the opposite direction is added between 
two InstagramUsers and a Friendship node is created for the corresponding 
FacebookUsers when the rule AcceptFriendship is applied. A triple graph is 
consistent if it can be generated using the four rules of the TGG and if it fulfils 
the two graph constraints. 


: FNetwork : N2N : INetwork 
kiia Pen ++ 
: FNetwork = : N2N : INetwork a| users ver 
+ ++ 
++ ++ ++ 
: FUser - za Á; U2U yy] :'User 
Fig. 4. Rule NetworkToNetwork Fig. 5. Rule UserToUser 
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users 
ak : 21 | : 
< Y2U | luser FUser U2U User j¢ 
follows +Y friends 
++ | follows 


: U21 si 
ech pee +t : FUser : U2U 


follows 


friends 
Fig. 6. Rule RequestFriendship Fig. 7. Rule AcceptFriendship 


To determine if a given triple is contained in the language of a TGG and 
fulfils all additional graph constraints, we try to find a set of rule applications 
that marks the input triple entirely while fulfilling all generated ILP constraints. 
If this is impossible, we conclude that the given triple is inconsistent and pro- 
vide a consistent sub-triple with maximum number of elements as result. Five 
constraint types and the construction of the objective function are briefly intro- 
duced using the example instance depicted in Fig. 8 which can be generated by 
the TGG but violates the constraint NoDoubleFriendship. The elements are an- 
notated with variables which correspond to those rules that potentially mark the 
respective element, i.e. NetworkToNetwork (d1), UserToUser (d2, d3), Request- 
Friendship (d4, d5) and AcceptFriendship (dg, d7). A variable is set to 1 if 
the associated rule application is chosen to be applied to create the solution 
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graph. Furthermore, Fig. 8 also depicts all matches for NoDoubleFriendship 
(pg), the premise of EnforceFriendship (pg)! and the conclusion for Enforce- 
Friendship (cio,¢11). To allow for uniform handling, negative constraints are 
represented as graph constraints with a premise but no conclusions. 

Context for rules: The applicability of rules that require elements as context 
depends on previous rule applications that have created these elements. In the 
example instance, the application of UserToUser (d2,d3) implies that the rule 
NetworkToNetwork (dı) was applied already, because the INetwork is required 
as context. ILP implication constraints of the form di => (d;,V---Vdj;,,)A---A 
(dk, V- - -V dk, ) are thus created for all rules applications d; with required context 
elements j,...,k, and rule applications (d;,,...,d .,dx,,---dx,,) that can 
mark these elements. 


jm»: 


Exclusions for rules: As elements should only be marked once, multiple rule 
applications that mark the same element exclude each other. The follows 
edges between two InstagramUsers can be marked both by applications of 
RequestFriendship (d4, d5) and AcceptFriendship (de, d7). For each element 
that can be marked by multiple rule applications d;,...,dj, an ILP exclusion 
constraint d; ®--- ® dj is created. 

Context for premises: Similar to ILP implication constraints for rules, matches 
for the premises of graph constraints also depend on context provided by other 
rule applications (whereas no elements are marked by those matches, so there 
are no context dependencies among them). However, as soon as the context is 
provided completely, the premise is fulfilled. The implication constraint is thus 
in the opposite direction: Choosing a subset of rule applications d;,...,d; that 
is sufficient to create the context for a premise match pp implies that pz has to 
be chosen. 

Context for conclusions: For a conclusion of a graph constraint to hold, all 
required elements have to be marked, which is reflected in a constraint similar to 
the context constraint for rules. In the concrete example, there are two matches 
(cio, C11) for the conclusion of EnforceFriendship (differing in F1 and F2 as 
Friendship nodes). 

Implications for graph constraints: The semantics of premise and conclu- 
sion(s) is reflected in the implications for graph constraints, which define that 
the presence of a premise match implies the existence of a corresponding con- 
clusion match. pg as a negative constraint is represented as a graph constraint 
with a premise but no conclusions, whereas p9 implies cjg or c11 to be satisfied. 
Objective function: In order to find a consistent solution for the given input, 
it is necessary to find a set of rule applications that marks the input models 
entirely. The objective function maximizes the number of marked elements, i.e. 
each variable associated with a rule application is weighted with the number of 
elements it marks, and the weighted sum is maximised. Variables associated with 
constraints need not be taken into account because they do not create elements. 


1 To simplify the solution, we omit symmetric matches that lead to more ILP con- 
straints but neither change the result nor provide additional insight. 
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Context for rules: Context for premises: 

-d => dı — d2 A d3 Ads A d7 => ps 

— da => dı — d2 ^ d3 A (da V de) A (ds V d7) = > po 

7 ps Ei r f i S Context for conclusions: 

— ds 1 A d2 ^ d3 

— dg => di A d2 A d3 Ads — cio => dzA d3 A (da V de) A (ds V dz) A de 

— dr => di Adz ^ d3 ^ da — ci => dA dA (da V de) A (ds V dz) A dy 
Exclusions for rules: Implications for graph constraints: 

— d4 Ọ de — ps => false 

~ ds @® dr — pg => cro Vcn 


Objective Function: max. 3dı + 5d2 + 5d3 + da + ds + 4de + 4d7 


d; d; d, 


ors 


U1: IUser users 


Cig C 
PaPa C1011 PaCioC11  Po,C1oC11 


d 
d; : 
friends <> 
Pg,Cio Po C10C11 U2 : IUser 
7 
friends follows PoCro/C11 
de Potu PaC10C11 


F1 : Friendship 


Fig. 8. Example instance with annotations for rule applications and constraint matches 


All context elements in the example instance can be marked setting d1, d2, 
d3, dg and d7 to 1 and d4 and ds to 0, leading to an objective function value 
of 21 equal to the total number of elements. This marking would however vi- 
olate the constraint NoDoubleFriendship, as U1 and U2 are connected by two 
Friendship nodes. This violation is reflected in the ILP constraints as well: The 
first context constraint for premises enforces setting pg to 1, which immediately 
contradicts the first implication for graph constraints. As no other subset of rule 
applications is able to mark the input triple entirely, the consistency check fails. 
The optimal solution, representing the maximal consistent sub-triple, is achieved 
either by exchanging d4 and dg or ds and d7 in the set of chosen rule applications, 
decreasing the objective function value to 18 and leaving one Friendship node 
and the two connecting friends edges unmarked. Note that for this example, 
the objective function and hard constraints contradict each other, emphasising 
the fact that constraints must be taken into account when computing optimal 
partial solutions. 
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4 Related Work 


Our contribution builds upon and extends the existing work on combining TGGs 
and ILP [17, 18, 20, 24]. This previous work covers the basic idea of modelling 
consistency checking without correspondence links as a search problem [17, 20], 
a proof for correctness and completeness [18], and a generalisation to include 
other operations such as unidirectional transformation and consistency checking 
with correspondence links [24]. Only basic TGG rules without graph constraints 
or ACs are handled, meaning that schema compliance cannot be guaranteed. 

To the best of our knowledge, all existing TGG-based approaches ensure 
schema compliance by enriching a provided TGG with suitable ACs. Ehrig et al. 
introduce NACs to TGG and prove correctness and completeness for unidirec- 
tional model transformation [10]. Golas et al. [13] extend these results to more 
general ACs for TGGs but only cover the direct application of TGG rules, i.e., 
model triple generation. In both cases, the runtime efficiency and thus practical 
feasibility of the derived operations is beyond scope. With a focus on guaran- 
teeing polynomial runtime, Klar et al. [16] present a translation algorithm with 
polynomial runtime for correct and complete TGG-based unidirectional model 
transformation. Klar et al. restrict the class of supported NACs to NACs that 
are only used to guarantee schema compliance, arguing that (i) such NACs can 
be supported efficiently, (ii) are still very useful in practice to guarantee schema 
compliance, and (iii) can also be efficiently supported by model synchronisa- 
tion algorithms (as later demonstrated [19]). Anjorin et al. [3] show that this 
restricted class of “schema compliance” NACs can be automatically generated 
from negative constraints and is thus equivalent to providing negative constraints 
together with a TGG. All these approaches, however, can only handle negative 
constraints that are contained in a single domain, as the derivation of forward 
and backward transformations can only handle “domain separable” NACs. 

Similar to our hybrid TGG/ILP-approch, Callow and Kalawski [5] combine 
model transformation and Mixed Integer Linear Programming (MILP) optimiza- 
tion techniques but focus on model compliance for forward transformations and 
not on deriving multiple consistency management operations. Xiong et al. [26] 
solve consistency management tasks using the Haskell-based language Beanbag. 
The approach considers implicit constraints and correspondences and is tailored 
to the application to Unified Modeling Language (UML) structures, though. 

There are also purely constraint-based approaches [11, 14,21] that encode 
both model structure and consistency relation into constraints and can easily 
handle schema compliance. This comes at a price, however, as the underlying 
constraint solvers do not scale with model-size and cannot compete with other 
approaches [1]. Our hybrid TGG/ILP approach is a compromise that leverages 
the flexibility of constraint solvers but still scales reasonably well [24] as the 
variables of the ILP problem are matches and not model elements. 

There are also various constraint-based approaches that use bio-inspired 
meta-heuristics and could also handle schema compliance. The tool MOMoT [12] 
realises model transformation based on evolutionary algorithms as a search strat- 
egy for rule orchestration. Similarly, the multi-objective optimisation technique 
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Design Space Exploration (DSE) is used by Denil et al. [7] in combination with 
the T-core transformation framework [23]. In their tool MOTOE [15], Kessen- 
tini et al. extract transformation blocks from examples and use Particle Swarm 
Optimisation (PSO) as a search technique. In general, approaches that use meta- 
heuristics can potentially scale better than exact search-based approaches, but 
have to sacrifice hard guarantees of correctness, completeness, and optimality of 
partial solutions. 


5 Preliminary Definitions 


Our basic definitions are adapted from Ehrig et al. [9], supplemented by the 
definition of schema compliance [3]. TGGs are a declarative rule-based approach 
which describes a language of triples of graphs. For that, we use the categorical 
definition of graphs, treating graphs as objects and graph morphisms as arrows, 
injectively mapping elements of one graph to those of another. 


Definition 1 (Graph (Morphism)). 

A graph G = (V, E, src,trg) consists of a set V of nodes (vertices), a set E of 
edges, and two functions src,trg: E —> V that assign each edge a source and 
target node, respectively. The set elem(G) = VUE denotes the union of vertices 
and edges. Given graphs G = (V, E, src,trg), G’ = (V', E’, src, trg’), a graph 
morphism f :G— G consists of two functions fy: V > V' and fz: E > E" 
such that src; fy = fe;src and trg; fv = fe;trg’. The ; operator denotes 
the composition of functions: f ; g(x) := g(f(x)). 


Based on Def. 1 triple graphs and triple morphisms can also be defined cate- 
gorically. A triple graph consists of a correspondence graph with a unique mor- 
phism to a source graph and a target graph each. An example for such a triple 
graph is depicted in Fig. 8. Source and target graph are interchangeable, such 
that the choice for source and target between the Facebook model and the 
Instagram model is just a question of design. 


Definition 2 (Triple Graph (Morphism)). 

A triple graph G = Gs & Go 4 Gr consists of graphs Gs, Gc, Gr and graph 
morphisms ys : Go > Gs and yr : Go > Gr. elem(G) denotes the union 
elem(Gg) U elem(Gc) U elem(Gr). A triple morphism f : G => G" with 


G = Gh ye) Go = Gr, is a triple f = (fs, fo, fr) of graph morphisms where 
fx : Gx > Gy, X € {9,0, T}, 983 fs = fo i Ys and yr ; fr = fo i Yr- 


In this setting, we introduce typing by demanding a type (triple) morphism 
to a chosen type (triple) graph. In Fig. 5, network nodes and user nodes can be 
distinguished by typing information, for instance. The language of a type (triple) 
graph TG is the set of (triple) graphs typed over TG. 


Definition 3 (Typed Triple Graph (Morphism)). 
A typed triple graph (G,type) is a triple graph G together with a triple mor- 
phism type: G —> TG to a distinguished type triple graph TG. A typed triple 
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morphism f : = Ĝ' is a triple morphism f : G— G' with type = f; type’, 
where G = (G, type), G! = (G’, type’). L(TG) := {G | I type : type(G) = TG} 
denotes the set of all triple graphs of type TG. 


In the following, all (triple) graphs and (triple) morphisms are assumed to 
be typed unless explicitly stated otherwise. A (triple) graph morphism can be 
viewed as a monotonic (triple) rule, such as depicted in Fig. 4, 5, 6 or 7 of the 
running example. By applying a (triple) rule on a concrete host graph, nodes 
and edges can be added to produce a new triple. (Triple) rules are applied by 
constructing a pushout, which can be interpreted as a generalised union of (triple) 
graphs R and G over a common sub-(triple)graph L: 


Definition 4 (Triple Rule (Application)). 

A triple rule r : L — R is a monomorphic (injective) triple L r R 
morphism. A direct derivation G To? G! viaa triple rule r, is 

constructed as depicted to the right by building a pushout over [e PO [m’ 


r and a triple monomorphism m : L — G called a match. A r 
* rıQmı r2Q@me2 TnAMn G —_* G 
Gn=G Gi =>. = 


derivation D : G 
Gn is a sequence of direct derivations. We denote by D = 
{di,..., dn} the underlying set of direct derivations included in D. 


Starting off with the empty triple graph, all triples that can be produced by 
finitely many rule applications form the language of a TGG. 


Definition 5 (Triple Graph Grammar (Language)). 

A triple graph grammar TGG = (G, R) consists of a triple graph G, and a 
finite set R of triple rules. The triple graph language of TGG is defined as 
L(TGG) = {Go}U{G | 3 D : Gg => G}, where Gg is the empty triple graph. 


While the formal definition of rule-based triple graph generation is completed 
at this point, we want to pose further restrictions on triples by introducing 
domain constraints. Therefore, we introduce graph conditions for triple graphs 
and graph constraints as a context-independent form of graph conditions. A 
graph constraint is either satisfied trivially, if there does not exist a match for 
the premise P, or if there exists at least one match for a conclusion C;. 


Definition 6 (Graph Constraint). 

A graph constraint is a pair gc = (pọ : Gg > P, {ci : P > C; | i € I}), for 
some index set I. P is referred to as the premise and {C; |i € I} as the 
conclusions of the graph constraint gc. A triple graph G satisfies gc, denoted 
by GE ge, iff Ymp : P > G,3i € [Ame : Ci > G, [mp = ci; Me], where 
Mp, (Me, ier are Monomorphisms. 


A type graph TG along with a set of graph constraints is denoted as schema 
for graphs. In the running example, the schema consist of the metamodel (Fig. 1) 
and the graph constraints depicted in Fig. 2 and 3. A (triple) graph complies to 
a schema if it is typed over TG and fulfils all graph constraints. 
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Definition 7 (Schema Compliance). 

A schema is a pair (TG,GC) of a type triple graph TG and a set GC C L(TG) 
of graph constraints. Let L(TG,GC) := {G e L(TG) | Vgc Ee GC, G H gc} 
denote the set of all schema-compliant triple graphs. 


Finally, a triple graph is denoted as consistent with respect to a schema and 
a TGG if it is schema-compliant and contained in the language of the TGG. 


Definition 8 (Consistency). 
Given a triple graph grammar TGG and a schema (TG,GC), a triple graph G 
is said to be consistent iff G € L(TGG) N L(TG, GC). 


6 Correctness and Completeness 


We now formalise our approach to guarantee correctness and completeness, i.e., 
the consistency check succeeds if and only if the input model is consistent. As our 
approach extends seminal work by Leblebici et al. [20], [18] and Weidmann et 
al. [24] towards graph constraints, large parts of the formalisation originate from 
these sources in an adapted version. The novelty of this section is the integration 
of graph constraints into this formal framework (Def. 10, 12, 15, 18), as well as 
showing that formal properties still hold in a setting with graph constraints 
(Def. 21 ff.), assuming that the TGG at hand is progressive (Def. 23), i.e. each 
rule application marks at least one element. 

In the original definition of TGGs (Def. 5), triples are generated by creating 
elements in source, correspondence and target graph simultaneously. For con- 
sistency checking, a TGG can be operationalised to check if a given triple is 
contained in the language of a TGG. In this case, elements are marked by rule 
applications instead of being created. To determine if a concrete triple graph is a 
member of the language of a TGG, one searches for a derivation sequence start- 
ing with the empty triple graph (cf. Def. 5) and producing the triple graph. The 
consistency checking operation derived from a TGG does not modify the input 
triple but instead marks this graph by successive rule applications in the course 
of a derivation sequence. An operational rule, derived from a corresponding triple 
rule, requires its context elements to be marked already. 


Definition 9 (Operational Rule and Marking Elements). 


Given a triple ruler: L + R, the op- 
erational rule cr: CL => CR forr is 
constructed as depicted to the right. It 
holds CL = CR = R, and cr : CL > 
CR = idcr. An element e € elem(R) 
is a marking element of cr iff fe! € 
elem(L) with rg(e’) = e or rc(e’) =e 
or rre) =e. 
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For operational rules, elements can be partitioned into those which are cre- 
ated by the original TGG rule (marked elements) and those which must be pro- 
vided as context (required elements). Graph constraints do not mark elements 
and therefore, only a set for the elements required by premise and conclusion, 
respectively, are defined. 


Definition 10 (Marked and Required Elements). 


cr@cm 


For a direct derivation d: G == G via an operational rule cr : CL + CR, the 
following sets are defined: 


— mrk(d) = {e € elem(G) | de’ € elem(CL), cm(e’) = e where e' is a marking 
element of cr} 

— req(d) = {e € elem(G) | de’ € elem(CL), cm(e’) = e where e’ is not a 
marking element of cr} 


For a graph constraint gc = (pg: Gg > P, {ci : P > C; | i € I}), we define: 


— req(pg) = {e € elem(G) | e € elem(P), mp(e’) = e} 
— req(c;) = {e € elem(G) | e’ € elem(C;), mc, (e’) =e}, iE I 


All candidate rule applications are associated with a binary variable which 
indicates by its value (0 or 1) whether the candidate is considered within the final 
solution. To determine the variable assignment, all candidates are collected and 
handed over to an ILP solver to determine the optimal subset of rule applications 
(cf. Sect. 2) respecting all linear constraints. 


Definition 11 (Constraints for Derivations). 

Given a triple graph G, let D : G => G be a derivation via operational rules with 
the underlying set D of direct derivations. For each direct derivation d,...,dn E€ 
D, respective binary variables 61,...,5n with 61,...,6n € {0,1} are defined. A 
linear constraint LC for D is a conjunction of linear inequalities which involve 
61,.--,0n. A set D’ C D fulfils LC, denoted as D' | LC, iff LC is satisfied for 
variable assignments 6; = 1 if d;i E€ D' and 6; =0 if di ¢D', 1 <i<n. 


Graph constraints are also associated to binary variables to ensure that only 
schema-compliant triples pass the consistency check, while premises and each of 
the corresponding conclusions are split into separate constraints. In contrast to 
the binary variables for rule applications, the value assignment cannot be chosen 
by the ILP solver. Instead, any variable assignment which does not violate the 
linear constraints is fine, as they ensure schema-compliance by the interrelations 
of rule applications and graph constraints. 


Definition 12 (Constraints for Graph Constraints). 

Let GC = {(pọ : Go > P,{ci : P —> C; | i € I})} be a set of graph con- 
straints. For each graph constraint gc € GC, respective binary variables T1... Tn 
for the premises and 1,1... Y1,mı ---Yn,1 --- n,m, for the conclusions are de- 
fined. A linear constraint LC for GC is a conjunction of linear inequalities which 


326 N. Weidmann and A. Anjorin 


involve Ti... Tnn and Y1,1-+-Yizm, +++ Yn ---Yn,mn- A triple graph G fulfils LC, 
denoted as G = LC, iff LC is satisfied for any variable assignment {7 . . . Tn} —> 
{0, 1}, {v1 oe) Viymy ++ Ynl ee Vin sivg} = {0, 1}. 


As the operational rules reflect the behaviour of the original rules of the 
underlying TGG, multiple markings for the same elements must be prohibited 
as this would mean that an element is created multiple times. For each node 
and edge, a linear constraint is created that ensures that this element is marked 
at most once in order to guarantee schema compliance and containment in the 
language of the TGG later on. 


Definition 13 (Sum of Alternative Markings for an Element). 

Given a triple graph G, let D : G => G be a derivation via operational rules 
with the underlying set D of direct derivations. For each element e € elem(G), 
let E(e)= {d € D | e € mrk(d)}. The integer mrkSum(e) denotes the sum of the 
associated variable assignments for each d € E: 


mrkSum(e) = X, 6; 
di€E(e) 


Definition 14 (Constraint 1: Mark Elements at Most Once). 
Given a triple graph G, let D : G => G be a derivation via operational rules: 


markedAtMostOnce(G) = A [ mrkSum/(e) < 1] 
e€elem(G) 


The reason for the sum of marked elements not being strictly equal to 1 is the 
desired treatment of inconsistent inputs: The system should still be feasible in 
case of inconsistent inputs and a maximal consistent sub-triple should be the 
result of the optimisation step. 

The following constraint ensures that the required context elements for oper- 
ational rule applications as well as premises and conclusions are provided in the 
final solution, such that the original TGG rule is guaranteed to be applicable in 
this situation and the marked part of the triple graph is schema-compliant. 


Definition 15 (Constraint 2: Guarantee Context). 

Given a triple graph G and a schema (TG,GC), let D : G => G be a deriva- 
tion via operational rules with the underlying set D of direct derivations. For 
each direct derivation d € D and each graph constraint gc € GC, the following 
constraints are defined: 


con(d) = N [6 < mrkSum/(e)| 


e€ req(d) 

con(pp) = V [ mrkSum/(e) < m] 
e€req(po) 

con(ci) = N [y < mrkSum(e)],i € I 
e€req(c;) 


context(D) = A con(d) A A con(po) A A con(c;)| 
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There are constellations in which rule application candidates mutually pro- 
vide context for each other in a dependency cycle, such that parts of the graph 
could be potentially marked by these rules, but none of them can ever be applied 
first because the necessary context is not yet there. Therefore, we introduce a 
relation > among rule applications to arrange them in a proper order. 


Definition 16 (Dependency Cycles). 

Let D : G => G be a derivation via operational rules with the underlying set 
D of direct derivations. A relation >C D x D between di,d; E€ D is defined as 
follows: 


di > dj iff req(di) O mrk(d;) #0 


A set cy C D with cy = {di,...,dn} of direct derivations is a dependency 
cycle iff di > --- > dn > dy. 


The following constraint breaks dependency cycles by forbidding to choose 
all of its member rule applications for the final solution. 


Definition 17 (Constraint 3: Forbid Dependency Cycles). 

Given a triple graph G, let D : G => G be a derivation via operational rules with 
the underlying set D of direct derivations, and let CY be the set of all dependency 
cycles cy € D. A linear constraint acyclic(D) is defined as follows: 


acyclic(D) = A z <n 


cyeCY,cy={d1,...,dn} t=1 


While the previous constraint types guarantee containment in the language 
of the TGG at hand as well as context constraints for premises and conclusions, 
Constraint 4 expresses the semantics of graph constraints to achieve schema- 
compliance. Thereby, the linear constraint is very similar to the definition for 
satisfaction of graph constraints (Def. 6). It is possible to formulate this con- 
straint independent of the concrete rule application because only graph con- 
straints are supported instead of arbitrary graph conditions. 


Definition 18 (Constraint 4: Satisfy Graph Constraints). 

Let (TG,C = {(p9 : Gg > P, {ci : P > C; | i © T})}) be a schema. A linear 
constraint sat(G) expressing that G fulfils all graph constraints of C is defined 
as follows: 


sat(G) = A rv V y 
CEC icl 


Finally, the objective function can be defined to maximize the number of 
markings over the entire input triple, while ensuring that no correctness con- 
straints are violated and the result is schema-compliant according to Def. 7. 
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Definition 19 (Optimisation Problem). 
Given a triple graph G and a schema (TG,C), let D: G => G be a derivation 
via operational rules. The ILP to be optimised is constructed as follows: maz. 


5 |mrk(d)| s.t. markedAtMostOnce(G) A contert(D) ^ acyclic(D) ^ sat(G) 
dED 


The remainder of this section provides a proof sketch showing that the consis- 
tency check always terminates, and succeeds iff the input triple graph is con- 
sistent with respect to Def. 8. It is an extension of the proof for correctness 
and completeness in a setting without graph constraints [18,24], such that the 
focus of this version is set on schema compliance. In the following, let a TGG 
TGG = (Gg, R), a schema (TG, GC), a triple graph G, and a derivation via op- 
erational rules D : G => G with underlying set of direct derivations D be given 
for all definitions, lemmas and theorems. 

First, we define a proper subset of operational rule applications as a set which 
is associated to a feasible solution for the ILP (Def. 14, 15, 17 and 18). 


Definition 20 (Proper Subset of Rule Applications). 
A subset D' C D is a proper subset of D iff D' | markedAtMostOnce(G) A 
context(D) A acyclic(D) A sat(G). 


Next, it is shown that there exists a sequence of the rule applications of a 
proper subset, such that the marked elements of the graph form a consistent 
triple. Furthermore, the marked part of the graph is schema-compliant. 


Lemma 1 (Consistent Portions of a Triple Graph). 
J proper subset D' CD => AG’ € L(TGG)NL(TG,GC) such that: 


elem(G') = U mrk(d') 
VED’ 


Proof (Sketch). When all direct derivations d € D’ are sequenced over the > 
relation (Def. 16), a proper subset according to Def. 20 is formed, resulting in a 
triple graph G” € L(T GG) consisting of the elements marked by D’. At the same 
time, G” will be schema-compliant iff D’ - sat(G’) as this predicate ensures that 
all given graph constraints are satisfied. 


We demand the property of mazimality to avoid trivial solutions such as the 
empty triple graph: 


Definition 21 (Maximal Proper Subset of Rule Applications). 
A proper subset D' of D is maximal if there does not exist any other proper 
subset D” of D with a greater objective function value (cf. Def. 19). 


The application of a sequenced maximal proper subset of rule applications 
on the empty triple graph is denoted as maximally marked triple graph. 


Definition 22 (Maximally Marked Triple Graph). 

Let D’ be a maximal, proper subset of D. The triple graph G identified with 
D' according to Lemma 1 is denoted as a maximally marked triple graph with 
respect to D. 
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Theorem 1 guarantees that a triple graph that can be completely marked by 
rule applications of a maximal proper subset is indeed consistent. 


Theorem 1 (Correctness). 
For a maximally marked triple graph G' with respect to D, it holds: 

U mrk(d) = elem(G) = > C is consistent 

dED 
Proof (Sketch). G’ € L(TGG) immediately follows from Lemma 1: As D is a 
maximal proper subset, G’ € L(TGG') holds, and the rule applications of D can 
be sequenced, such that they can mark G” entirely according to the premise of 
this theorem. G” € L(TG,GC) holds as well because the choice of any d € D’ 
leading to a violation of any gc € GC would make sat(G’) false. Therefore, G” is 
consistent according to Def. 8. 


To guarantee completeness, it remains to show that the process of construct- 
ing the ILP terminates, which requires the set of possible rule applications to 
be finite. As all possible derivation sequences are collected, the ILP solver ter- 
minates with an optimum solution iff one exist. We therefore demand the un- 
derlying TGGs to be progressive, i.e., each operational rule is required to mark 
at least one element. In fact, operational rules that do not mark elements cor- 
respond to TGG rules that do not have any effect on the host graph they are 
applied on because they cannot add any elements, and are therefore irrelevant 
for practical use. 


Definition 23 (Progressive TGGs). 
TGG is progressive if each of its operational rules has at least one marking 
element. 


Demanding the TGG at hand to be progressive, completeness can be con- 
cluded by showing that the consistency check cannot cycle. 


Theorem 2 (Completeness). 
Let TGG be progressive. A maximally marked triple graph G’ with respect to D 
exist such that: 

G is consistent — |J mrk(d) = elem(G) 

dED 

Proof (Sketch). As Lemma 1 guarantees the existence of a derivation D, and 
ILP solving always produces a maximally marked triple graph G”, we only need 
to show the implication (equivalence follows from Thm. 1). To derive a con- 
tradiction, we now assume that G” is consistent, but that G” either contains 
unmarked elements or violates any constraint gc € GC. From G” being consis- 
tent, it follows from the decomposition and composition theorem for TGGs and 
operational rules [8,18] that there exists a derivation sequence D’ : G = G" 
with operational rules. This means that at least one rule application of D’ is not 
contained in D or G’ violates any gc € GC. The latter is impossible, as it would 
contradict to the assumption that G” is consistent. The former implies that the 
objective function value could be increased by using D’ for marking G, which 
contradicts the optimality of the result found by ILP solving. 
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7 Implementation and Experimental Evaluation 


We investigate the impact of graph constraints on runtime performance, con- 
sidering scalability of consistency checking for growing model sizes with and 
without taking graph constraints into account, by two research questions: 


(RQ1) By which factor does the number of variables and ILP constraints increase 
when introducing graph constraints to the ILP? How does this influence the 
runtime of pattern matching, ILP construction, and ILP solving? 

(RQ2) How does the runtime performance relate to model size (number of nodes 
and edges) for consistency checking with and without graph constraints? 


Setup: We implemented our approach within the tool eMoflon? using Neo4J? 
as an underlying graph pattern matcher and database for querying and stor- 
ing the models. As a test example, we took the FacebooktoInstagram TGG 
as described in Sect. 2. To obtain synthetic models, we used the derived TGG- 
based model generator to produce random models with 1078 to 226,988 elements 
(roughly the same number of nodes and edges). We then executed the derived 
TGG-based consistency checker, once taking the negative graph constraint from 
Sect. 2 into account, and once without any graph constraints. For each configu- 
ration, the number of variables and constraints of the ILP, as well as the time 
needed for pattern matching, ILP construction, and ILP solving were measured 
for 10 repeated runs. As final values, the medians of the 10 test runs were taken 
to minimize the bias introduced by outliers. All performance tests were executed 
on a standard notebook with an Intel Core i7 (1.80 GHz), 16GB RAM, and Win- 
dows 10 64-bit as operating system. An installation of Eclipse IDE for Java and 
DSL Developers, version 2019-09 with Java Development Kit (JDK) version 13 
was used. The JVM running the tests was allocated a maximum of 4GB memory, 
and 8GB were allocated to the graph database Neo4J. 

Results: Figure 9 shows the time needed for pattern matching, ILP con- 
struction, and ILP solving for different model sizes. One can observe that for 
both configurations (with and without graph constraints), the runtime of all 
components depends linearly on the number of model elements. Taking graph 
constraints into account for the consistency check makes the ILP construction 
roughly 20% - 40% slower. This is to be expected as the ILP problem is simply 
larger. For similar reasons, a difference can also be observed for the ILP solving 
step, whose runtime is negligible without constraints, but increases by a factor 
of 10 when including graph constraints. While this increase is substantial, ILP 
solving does not have a large overall impact on the runtime performance even for 
200k elements. Interestingly, pattern matching gets faster when the additional 
negative graph constraint is included. This is surprising as additional pattern 
matching is required to determine matches of the negative constraint. The un- 
derlying graph database is heuristic-based, however, and also uses caching strate- 
gies to decide what data to keep in memory. Apparently the pattern matching 
strategy applied for the collection of patterns including the negative constraint 
seems to scale better for model sizes greater than 130k. 


? github.com/eMoflon/emoflon-neo * neo4j.com f bit.ly/2BFAutd 
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The number of binary variables and constraints grows linearly with model size 
for both settings, involving slightly more variables than constraints (cf. Fig. 10). 
With the negative graph constraint, this number increases by about 25%-50%. 

Summary: Revisiting our research questions, one can state that the number 
of binary variables and constraints increases by a constant factor when introduc- 
ing (negative) graph constraints, resulting in a constant increase of the overall 
runtime for consistency checking. While the ILP solving step increases substan- 
tially and could become problematic for large models, our measurements indicate 
that the ILP solving step is probably not the bottle neck for our example (RQ1). 
In both settings (with and without the negative graph constraint), the runtime 
for consistency checking increases linearly with growing model size (RQ2). 

Threats to validity: The evaluation was performed with only one TGG 
consisting of only four rules, only the consistency checker (of all operations) was 
run on randomly generated synthetic instances, and we measured the additional 
price of taking only the negative graph constraint from Sect. 2 into account. 
While our initial results are positive and indicate that the additional price of 
guaranteeing schema compliance as we propose does not render the ILP-based 
TGG operations infeasible due to an explosion in runtime, extensive bench- 
marking with multiple TGGs, multiple graph constraints, larger model sizes, 
and multiple consistency management operations is required to transfer these 
results to practical, real-world applications. 


8 Extension to Other Operations 


The presented concepts are tailored to consistency checking with correspon- 
dences, i.e. source, target and correspondence model are given as inputs and are 
marked by operational rule applications, whereas all three models are simulta- 
neously created by the original rule applications. There are also other operations 
which use a mixture of creating and marking elements to complement given in- 
put models to a complete triple. Figure 11 depicts the example instance of Fig. 8 
annotated with the operations which require the respective model(s) as input. 
The previously presented CO (check only) operation gets all three models as 
input, whereas CC (correspondence creation) checks for consistency by building 
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up the correspondence model for given source and target models. FWD_OPT 
and BWD_OPT are operations for unidirectional transformation, i.e. either the 
source or the target model is given and a consistent transformation to the re- 
spective other domain is computed. A formal specification of the operations was 
introduced by Weidmann et al. [24]. 

All these operations are based on 4 > gd > 
a common formalism that expresses ETA f or ee 
dependencies between rule applications 
as ILP constraints, while in contrast 
to the definitions of this paper, de- i L 
pendencies between created elements AT U2:FUser Jot ' OS 
are also taken into account. As con- A Oo 
straints for marked and created parts | EE | 
of the triple are formed almost the i i i 
same way, it is possible to transfer Fig. 11. Input models per operation 
the results for consistency checking 
respecting graph constraints to the other operations as well. However, the formal 
proof which guarantees the operations’ correctness and completeness [18,24] has 
to be extended to take graph constraints into account. 


{ww : FNetwork } 7 f NW : INetwork 
| users 


f \users 
U1: FUser 3 | — U1: lUser \ 


d 


9 Conclusion and Future Work 


We presented an extension of a seminal approach to combining TGGs and ILP 
by supporting graph constraints. For consistency checking with given correspon- 
dence links, we have shown correctness and completeness of the approach. The 
results can be generalised towards other operations such as unidirectional trans- 
formations as well. Additionally, the approach was implemented in a TGG tool, 
and an experimental evaluation indicated that the scalability of the approach is 
sufficient for practical use. For future work, we plan to extend the approach to 
cope with general AC as well, increasing the expressive power of the supported 
class of TGGs. As a proof of concept, we only implemented negative constraints 
until now, which should be extended towards general graph constraints. Using 
an incremental pattern matcher with extensible matches, it should be possible 
to collect matches for the premise and corresponding conclusions at once, which 
would keep the implementation efficient. Further performance tests with other 
(industrial) examples will also be necessary to underpin the validity of the evalu- 
ation results with respect to runtime performance, as both the metamodels and 
the rule set are very restricted, whereas the considered model sizes are realistic. 
Generating consistent models first and then mutate them slightly would further 
lead to a smaller and therefore more reasonable number of inconsistencies. 
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Abstract. Model management is a central activity in Software Engineer- 
ing. The most challenging aspect of model management is to keep models 
consistent with each other while they evolve. As a consequence, there 
has been increasing activity in this area, which has produced a number 
of approaches to address this synchronization challenge. The majority of 
these approaches, however, is limited to a binary setting; i.e. the synchro- 
nization of exactly two models with each other. A recent Dagstuhl seminar 
on multidirectional transformations made it clear that there is a need for 
further investigations in the domain of general multiple model synchro- 
nization simply because not every multiary consistency relation can be 
factored into binary ones. However, with the help of an auxiliary artifact, 
which provides a global view over all models, multiary synchronization 
can be achieved by existing binary model synchronization means. In this 
paper, we propose a novel comprehensive system construction to produce 
such an artifact using the same underlying base modelling language as the 
one used to define the models. Our approach is based on the definition 
of partial commonalities among a set of aligned models. Comprehensive 
systems can be shown to generalize the underlying categories of graph 
diagrams and triple graph grammars and can efficiently be implemented 
in existing tools. 


Keywords: Model Synchronization - Multimodelling - Multidirectional 
Transformations (MX) - Inter-Model Consistency - Model Merging - 
Graph Diagrams - Triple Graph Grammars - Category Theory 


1 Introduction 


Conceptual models, i.e. abstract specifications of the system under development, 
are recognized to be of major importance in software engineering [52]. Repre- 
senting the whole system in a single global model is generally unfeasible, hence, 
different teams design and maintain several models which focus on different 
aspects of the system. This collection of inter-related models is often referred to 
as a multimodel. A rigorous use of these models within the engineering process 
eventually requires consistency management of multimodels. This is because 
the collection of models must obey global consistency rules and as models are 
inevitably subject to change, global consistency becomes an issue [16]. 


© The Author(s) 2020 
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Model Synchronization represents a means to maintain global consistency of 
inter-related models by combining consistency verification with (semi-)automatic 
consistency restoration. The cross-disciplinary research field Bidirectional Trans- 
formations (BX) [8] investigates such means within different communities and 
it provides a number of theoretical and practical results (see [2] for a recent 
survey). However, the majority of these approaches is limited to a binary setting, 
i.e. keeping pairs of models consistent. Stevens [44] recognized this limitation 
in her outreach to the modelling community that lead to an increased momen- 
tum in this area as evident from a recent Dagstuhl seminar on Multidirectional 
Transformations (MX) [7]. 


One way to address multiary synchronization is to con- 
sider it as a network of well-understood binary synchroniza- 
tion problems. However, not every multiary consistency 
rule can be factored into binary ones [9]; e.g. the class 
diagrams A!, A? and A? in fig.1 are pairwise consistent 
but not altogether—since class inheritance is acyclic. Thus, 
multiary model synchronization is needed to keep global 
consistency. Another approach to global consistency man- 
agement is the model merge approach [6]: It constructs 
the union of all models wherein the related elements are 
identified, see lower half of fig. 1 (inter-relations given by 
sameness of class’ names). Thus, global consistency can 
be verified within a single artifact, the merge. However, 
the major drawback of this approach, apart from requiring 
additional computational overhead, is that it forgets the 
origin of elements; e.g. that class C was contained in A! and 
A? but not in A. This is a problem if global consistency 
rules depend on this containment information. 


Fig.1. Inconsistent 
class diagrams 


The most important information in multiary model synchronization are the 
inter-relations between models and their elements. We call the latter common- 
alities and cannot generally assume that they are always given by equality of 
names as it was the case in fig. 1. Thus, multimodels must be extended with such 
commonality information, which allows element traceability and global consis- 
tency verification. Aligning models via an additional commonality structure has 
some tradition, e.g. it is the foundation of Triple Graph Grammars (TGGs) [40], 
a formal and mature BX approach with a focus on Model Driven Engineering 
(MDE). In the TGG approach, models are considered to have a graph based 
structure, i.e. there is a common underlying base modelling language and we will 
also stick to this idea of a common base language. 


In this paper, we propose a novel construction called comprehensive system 
which serves as a foundation for various ways of multiary model management. 
It is based on a simple, non-intrusive and easy-to-handle linguistic extension of 
the base modelling language with commonality specifications, which allows to 
work with an arbitrary number n > 2 of heterogeneously typed (local) models as 
one single (global) model. Moreover, we will show that we are still able to apply 
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mature methods for model verification and restoration in the same way as for 
single local models. Furthermore, we show that this approach is more expressive 
than, and overcomes the obstacles of, the model merge approach, and that it 
generalizes TGGs and graph diagrams [48] — a recent generalization of TGGs. 


Before defining comprehensive systems and their properties (sect. 5 and 6), 
we clarify terminology (sect. 2), introduce of a running example (sect. 3), and 
provide an overview of the state of the art (sect. 4). An extended version of the 
proofs in sect. 6 is given in the technical report [47]. 


2 Preliminaries: Multimodelling 


Every fast moving research field is prone to produce separate terms for the same 
concepts. Thus, we begin with a short definition of the most important terms in 
multi-model consistency management. We will stick to the imperative of MDE 
[42] and consider all Software Engineering (SE) artifacts as models: 


Model A model is an abstract specification of the system (or parts of it) under 
development. Models are atomic elements in the multimodel consistency 
management process. To be amenable for electronic processing, we assume 
them to be formal, i.e. following the format of a specific modelling language. 
We denote models by capital letters A, A’, A!, A? etc. 


Metamodel and Conformance Every modelling language is specified by an 
artifact called metamodel. We denote metamodels by capital letters M, M’, M1, 
M? etc. Models must conform to their respective metamodel, i.e. the model 
must be well-structured w.r.t. the metamodel and fulfill all constraints im- 
posed on the metamodel, thus further narrowing admissible model structure. 
The model is then called an instance of the metamodel. Conformance is 
also called local or intra-model consistency. We denote a single constraint by 
lowercase ¢ and a set of constraints by uppercase &. A metamodel with a set 
of constraints imposed on it will be written Mg. 


Correspondence is a relation among a set of models. It is a consequence of 
commonalities (common concepts) shared by these models. A collection of 
models together with a correspondence among them is called a multimodel. In 
the similar way as for local models, global consistency rules can be imposed 
on a multimodel. It is considered (globally) consistent, if all local constraints 
and global consistency rules are fulfilled. Consistency of a multimodel is also 
referred to as inter-model consistency. 


Model Space A model space is a set of models together with changes among 
them. In an MDE setting it can be considered to be given by a metamodel M: 
The set of all instances of M together with M-respecting instance changes, 
which describe how an instance A’ is the result of edits on A. We write 
Mod(Mg) to denote the respective model space. 
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3 Use Case 


We depict a collaborative modelling example within healthcare. More concretely, 
the task is to develop ICT support for a patient referral process. A referral is 
“the act of sending a patient to another physician for ongoing management of a 
specific problem with the expectation that the patient will continue seeing the 
original physician for co-ordination of total care” [41]. It is an important and 
recurring process in the healthcare domain. Hence, ICT-support is desirable [51]. 

At the same time, development remains tricky since it requires multiple actors 
(software vendors, government officials, hospitals and physicians) to agree on 
common data structures, processes and interfaces. For our example, let us assume 
that the design of the system follows a model-based approach and there are three 
different models, each covering a different aspect of the system: There is a process 
model A! denoted in Business Process Model and Notation (BPMN) [30], a data 
model A? denoted as a Unified Modelling Language (UML) class diagram [32], 
and a decision model A? denoted in Decision Model and Notation (DMN) [33]. 

These three models are depicted in fig. 2 (ignore the cyan lines for the moment). 
The central ingredient is the process model At. It represents a simplified version 
of the process developed in [51]. The process is triggered by a patient’s appeal 
beginning with an introductory consultation. Afterwards the main part of the 
process begins: Information about the patient and its medical history is extracted 
while in parallel a consultant is selected via a business-rule activity. The 
patient information is then sent to the consultant. The consultant can either 
approve the referral or reject it. In the latter case, another consultant has to be 
found. If a consultant accepts the referral, the process is finished. 


[Consultant 


PatientDatal 


Rejections Acceptance 


Referral 1 od 


Extract ks 
e Patient is 
Information Be i 
à Approval Referral 
& Patient Send ; Finished 
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i H select rae id 
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(Diagnosis, "n Reject 
Urgency) RO- Consultant [= pecialization) (=) 


A! 
A PatientData 
patient, 
r elect Consultant 
medicalHistory 
Patient wy 
+ patientld: String Diagnosis i iagnosis urgency jci specialization 
+ firstName: String of |+ description: String *} Physician H String Boolean String 
+ lastName: Sting [£+ urgency: Booleans, |+ practicionerid: String »|; | 1_| "Essential Hypertension" "LH8-09-1223" | "Cardiology" 
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Fig. 2. Example models A!, A? and A? and their commonalities 
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The other models in fig. 2 contain the respective data types (A?) and specify 
the domain-specific behaviour of the “Select Consultant” activity (A3). The 
latter is depicted as a table that assigns, for a given combination of values in 
input side columns, a combination of values in output side columns, i.e. based on 
diagnosis and urgency, an appropriate consultant is selected (which is identified 
by a practicionerId and specialization). 

All models could be edited completely independent of each other would there 
not be a correspondence between them. It arises from the existence of abstractly 
“the same” information simultaneously contained in multiple models. Consider 
e.g. the column called diagnosis in A, which is reflected by a process variable 
in A! (visualized by a file symbol) and an attribute named description in A?. 
We call these relations commonalities and depict them via cyan lines in fig. 2. 

But the arising multimodel (models A‘, A?, A? plus their commonalities) un- 
derlies consistency rules [11] (see sect. 2) which define consistency of a multimodel. 
For our example, assume the following consistency rules: 


CRI For every business-rule activity in A!, there must exist a corresponding 
decision table in A? and vice versa. 

CR2 Every column type in A must refer to an existing data type in A? with 
the same name. 

CR3 Every column in A? must have a corresponding public attribute (denoted 
by +) in A? and should be reflected by a process variable in Al. 

CR4 Every process variable in A! must either be reflected by a class or an 
attribute in Aĉ. 


To actually maintain consistency of A!, A? and A®, w.r.t. CR1-CR4, we begin 
by a review of the state of the art how commonalities are identified, consistency 
is verified and if needed restored. 


4 State of the Art 


A seminal exposition of the process of multimodel consistency management is 
already given in [43]. It comprises four phases: (i) Detection of overlaps (we call 
them commonalities, see sect. 3, (ii) Detection of inconsistencies, (iii) Diagnosis of 
inconsistencies, and (iv) Handling of inconsistencies. The first step is also called 
model alignment. Many approaches do not consider an explicit diagnosis stage 
and combine (iii) and (iv) into a phase called consistency restoration a.k.a. model 
repair [28]. Hence, existing work can be grouped into these three categories: 
Alignment The goal of model alignment is to identify relations between 
models, i.e. finding their commonalities. This procedure, a.k.a. model matching, 
has been studied in several domains: databases [35], ontologies [15], MDE [23], 
graph transformation [14] and software product lines [53]. Automatic model 
matching, in general, is NP-hard [36]. However, there may be domain-specific 
heuristics [53] which exploit underlying global identification mechanisms, e.g. 
social security numbers for persons or the ICD-10 ontology [54] for diseases. 
Surveys on this topic can be found in [15] (focus on ontologies), [35] (focus on 
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databases) and [23] (focus on MDE). Further, it is important to note that model 
element matching requires that elements are transferable between models. This 
is e.g. directly given within the UML or multi-viewpoint modelling as there is 
a single underlying metamodel [3]. If this is not given a priori, matching on the 
level of metamodels [38,10] has to preceed the matching of model elements. 


Verification The goal of consistency verification is to find all consistency 
violations. A recent survey on this topic is found in [22]. The focus of the 
authors is on UML but the results are universal. They present four categories 
to classify verification approaches: system model (SMV), universal logic (ULV), 
heterogeneous transformation (HTV) and dynamic metamodelling (DMV). In 
the SMV approach every model is translated into a comprehensive artifact where 
the verification is executed. ULV is a variant of the former where the translation 
is executed on the level of an underlying logic. HTV define translations between 
each pair of models and DMV considers extensions of each metamodel with 
elements from other metamodels or models to express global consistency. 


Restoration A comprehensive survey about model repair approaches is 
found in [28], whereas [2] is a recent survey about BX based approaches. Insights 
from these surveys show that there are basically three categories of consistency 
restoration approaches: programming based (PBR) approaches where consistency 
and its restoration is explicitly defined simultaneously, solver based (SBR) ap- 
proaches where consistency is abstractly posed as logic formula and restoration 
is implemented using a solver or search-based algorithm, and finally, grammar 
based (GBR) approaches such as TGGs [19], which place themselves somewhere 
in between. The big majority of these approaches, however, considers binary 
synchronization only. There are only few notable exceptions, e.g. the solver based 
Echo [29] and the graph diagram framework [48,49]. 


Architecture Analyzing the underlying system architecture of these ap- 
proaches, there are, in principal, two designs: We call them the network design 
and the span design. Consider the multimodel as a graph where nodes repre- 
sent models and edges represent correspondences (for alignment), consistency 
relations (for verification) or repair functions (for restoration). In the network 
design there are edges between each pair of models. In the span design the graph 
has a hub-and-spoke layout, i.e. there is an additional hub-node that has an 
edge towards every model. Approaches in the categories SMV, ULV and SBR 
are associated with a span design since they perform a translation into a an 
intermediate model, while approaches in the categories HTV, DMV and PBR are 
associated with the network design because they directly act on a pair of models. 
GBR approaches have used either of them. 


Comparing the architecture, the network design puts the complexity on the 
edges whereas the span design puts complexity on the nodes (more specifically on 
a single node: the hub). The drawback of the network design is that the number 
of edges grows quadratically with the number of participating models and if 
consistency relations cannot be factored into binary relations, hyperedges are 
required, which further increase the complexity. Another issue with this design is 
the coordination of concurrent changes. The drawback of the span design is the 
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additional overhead of the hub-node model, however, the hub-node provides a 
means to coordinate concurrent changes. 


5 Comprehensive Systems 


In this section, we introduce comprehensive systems (sect. 5.1 to 5.3), which follow 
a SMV-approach and mitigate the drawbacks of the span design. We will show in 
sect. 5.4 that comprehensive systems are a foundation for the PBR restoration 
approach and we conjecture that the same is true for SBR, because they do not 
fundamentally differ from the structure of local models, such that they can be 
fed into existing means for model verification and restoration. Moreover, sect. 5.5 
shortly reports why our approach eliminates the model merge obstacles (see the 
discussion in the introduction and fig. 1). 

Before introducing comprehensive systems concretely, we want to illustrate 
where they occur in typical conceptual workflows for multimodel consistency 
management. Fig. 3 depicts such a workflow which is more or less informally used 
in many approaches of multimodel management, e.g. [16]. It comprises the phases 
mentioned in sect. 4: alignment, verification and restoration. The result of the 
first stage are the comprehensive metamodel and global consistency rules imposed 
upon it, and metamodel element commonalities, which are stored persistently to 
avoid expensive re-computation and possible information loss, cf. motivation in 
[25]. These commonalities are then used to compute the comprehensive system 
under consideration, e.g. a model merge. It can be used in the subsequent phases 
shown in fig. 3. 

In contrast to this additional computation, our definition of comprehensive 
system is based on a non-intrusive extension of existing models by commonalities 
without extensive computations. Furthermore, it enables natural internalizations of 
inter-relations between different local models into a single artifact. Our intention 
is to demonstrate this internalization informally in this section and formalize 
it in sect.6, where we will also state that the resulting structure generalizes 
triple graphs [40] and graph diagrams [48]; hence it is ready to be used in GBR 
approaches, too. 


Model Alignment Consistency Verification Consistency Restoration 


a Type Somprehensive consistency ement Ea hg 
Comprehensive Model Effect 
Pommenalty. Matemoedel Rules its LO Sots Derivation Verification Modell Repal Localization PC) 
Definition Derivation Formulation Identification 


a TE BE J Y A 


ied Ej "hie Tar Ñ Propagations 
Comprehensive N- Commonalities < 


Type 
Commonalities Metamodel 


Consistency Comprehensive 
Rules 


Fig. 3. General Multimodel Consistency Management Process 
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Fig. 4. Metamodel Example and Base Language 


5.1 Typed Local Models 


We begin on the level of metamodels: Fig. 4a depicts a simplified metamodel M+ 
of BPMN for our example. We do not endorse any specific MDE-framework and 
denote metamodels in a UML class diagram-like style. Metamodels M? and M? 
for UML class diagram and DMN models can be defined in the same way as 
metamodel M! (excerpts of them are shown in fig. 5). E-graphs [12] (see fig. 4b) 
give a formal interpretation to the class diagram syntax, which may serve as an 
appropriate base modelling language B for our purposes, i.e. a shared linguistic 
(meta-)metamodel [26]. It consists of Graph Nodes GN and Data Nodes DN 
(complex and primitive types in the UML terminology), as well as Graph Edges 
GE (associations) and Node Attribute Edges NAE (attributes) together with 
appropriate owner and target functions. For the sake of simplicity we omitted 
edge attribute edges, which are usually included in E-graphs. Every model A 
must conform to a metamodel M. Since models and metamodels can be depicted 
as E-Graphs, the conformance relation is a typing homomorphisms t : A + M 
between the E-Graphs A and M. If, e.g. a is a flow node in A1, see fig. 2, then 
t(a) = FlowNode € M+. Hence, model space Mod(M) is the category of E-graphs 
typed over M. E-graphs are only one possible base language and we will work 
with arbitrary base languages in sect. 6. Nevertheless will we use the term “graph” 
to subsume all artifacts under consideration (models and metamodels). Thus, we 
will use the terms (graph- and data-) “nodes” and (graph- and node attribute-) 
“edges” for the contents of these graphs, see [12] for the original terminology. 

If a set ® of constraints (e.g. a set of formulas given in a specific logic) is 
imposed on M, then the space is reduced to the full subcategory Mod(Mg) of all 
consistent models typed over M w.r.t. P. Besides UML-internal constraints (e.g. 
the 1..1-multiplicity on src and tgt in fig. 4a) given in the modelling technique, 
there are often attached constraints o € P. An example for an attached constraint 
is @ :=control_flow, see the note at FlowNode in fig. 4a. This constraint defines 
that every Start Event must not have any incoming SequenceF low [30, p. 237], 
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whereas an End Event must not have any outgoing SequenceFlow [30, p. 245]. 
Listing 1.1 shows an Object Constraint Language (OCL) [31] formulation of this 
constraint. 


Listing 1.1. Constraint ¢:=control_flow formulated in OCL 


context FlowNode inv: 
self.oclIsTypeOf(Event) and self.eventType=EventType::START) implies 


self.incoming->count() = 0 
and (self.oclIsTypeOf (Event) and self.eventType=EventType::END) implies 
self.outgoing->count() = 0 


OCL is just an example of a possible means for defining attached constraints. 
As we do not endorse a specific metamodelling framework and thus also not 
endorse a specific technique for the definition of attached constraints, we treat 
all constraints uniformly and assume that all internal and external constraints 
can be modelled as diagrammatic constraints [37]. A diagrammatic constraint 
@ imposed on a metamodel M possesses an “arity graph” Sg and is imposed on 
M by a scope dọ : Sẹ > M (a homomorphism). The semantics is provided by a 
predicate checky : Mod(S4) — Bool, which verifies whether a given structure 
typed over the arity fulfills this constraint. The scope highlights a fragment (the 
image of d) of metamodel M, e.g. the blue coloured fragment in fig. 4a is the 
scope of the constraint ¢ from listing 1.1. For a typed graph t: A > M, the 
verification procedure verify(t) = checkg(query(t)) comprises two steps: First, 
query forgets all elements of A not typed over the scope, then it retypes the 
remaining elements w.r.t. d such that they are typed over Sg. That is, query 
implements the pullback of d and t. Finally, checkg is invoked on the pullback 
result. 


5.2 Extending the Base Language 


As seen in sect. 3, consistency rules play a major role in multimodelling. However, 
we cannot directly formalize them via the diagrammatic constraints described 
above since their definition involves elements spanning multiple models. Note 
that inter-relations between models arise from models sharing abstractly the 
“same” real-world concepts (see the intuitive cyan lines in fig. 2). We name these 
structural relations commonalities and they are also well-known in practice 
as traceability links [16,39,1]. There are different interpretations of what such 
a link can mean, e.g. identity, subset, extension? etc. [16]. In our framework 
commonality semantics are kept abstract, i.e. considering them as any kind of 
structural relation allowing us to define diagrammatic constraints in multimodels. 

For example, in order to formalize CR2, we need to declare a commonality 
between the terms DataType (in M?) and ColumnType in M?. In addition to 
these binary commonalities in which only two terms are matched, there are 
also ternary commonalities, e.g. String occurs in all three metamodels and it is 
necessary to relate BPMN-term ProcessVariable with UML-term Attribute 
and DMN-term Column together with their respective name- and type-features to 
express CR3. These declarations may be formulated in an intuitive domain-specific 
language (DSL) shown in listing 1.2. 
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Listing 1.2. Type Commonalities 


commonalities (BPMN,UML,DMN) { 
relate (BPMN.String ,UML.String ,DMN.String) as String; 
relate (BPMN.Activity ,DMN.Table) as Decision; 
relate (BPMN.ProcessVariable , UML. Attribute , DMN.Column) 
as Var with { 
relate (BPMN.name,UML.name,DMN.name) as name; 
relate (DMN.type,UML.type) as type; }; 
relate (UML.DataType ,DMN.ColumnType) as Type 
with { relate{UML.name,DMN.name} as name; }; 
relate (BPMN.ProcessVariable ,UML.Class) as Entity; } 


CAN DoT KF WN BS 


HK 
o 


The specification in listing 1.2 extends the modelling artifacts M+, M? and M? 
and we call its syntax a linguistic extension. Each relate-statement translates 
to an object, which is identified by an alias (keyword as) and which reifies the 
“tupling” of terms it relates. E.g. the object Var in lines 4-7 specifies a commonality 
of the triple ProcessVariable (M'), Attribute (M7), and Column (MÌ). Var 
is an object in its own right and we call it a (commonality) representative. 

However, not only the nodes (of the graphs) should be related: In listing 
1.2 we see that the keyword with defines the two features, i.e. edges, type and 
name of the respective graphs to be related as well. Common edges require 
that their respective source and target nodes are also related, e.g. the type- 
commonality entails commonality of Attribute and Column, which is already 
given by the surrounding relate-statement, as well as commonality of DataType 
and ColumnType (see lines 8-9). Hence, commonality specifications must preserve 
edge-node-incidences. 

Consequently, it is reasonable to use the same language B for commonality 
representatives. In such a way, a commonality specification is itself an E-graph: 
The semantic interpretation of listing 1.2 is depicted in cyan in fig. 5. The proper 
linguistic extension further comprises mappings, which assign to each commonality 
representative w the elements it relates. E.g. Decision is mapped to Activity 
and to Table in the respective metamodels. Since the assignment syntax in the 
above DSL also contains the target metamodel of the related elements (e.g. BPMN 
in relate(BPMN.Activity...)), these mappings decompose into 3 projection 
mappings pj : M? + M? (j € {1,2,3}), depicted by dotted arrows in fig. 5, e.g. 
pi (Decision) = Activity € M1, as well as po(Type) = DataType € M7”, the 
target metamodel now encoded in p’s index. Since the corresponding tuples can 
be of arbitrary arity, these mappings may be partial: 


pi(w') =L, po(w') = DataType, p3(w’) = ColumnType 


if w’ = Type. Finally, the above required edge-node-incidence means that defined- 
ness of p;(e) entails definedness of p;(v), where v is the source of e, and 


p;(v) = source of p;(e) (1) 


for all edges e in M? (and likewise for targets). 
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Fig. 5. Commonality representative metamodel M° 


5.3 Metamodel and Model Commonalities 


The previous section showed that a linguistic extension of the base language 
with projection functions between commonality representatives and the elements 
they relate yields an alignment of metamodels M?,...,M”. The result is a 
comprehensive metamodel, in which commonalities are accurately specified with 
the help of (a graph of) commonality representatives. Formally, we obtain a new 
graph M° and partial projections 


Pe ari 
M?S M’. (2) 
for all i € {1,...,n}. Since all artifacts under consideration (models and meta- 


models) conform to the base B, see sect. 5.1, commonalities among models 
At € Mod(M!}),..., A” € Mod(M™) can be encoded in the same way, i.e. there 
is a graph A? of commonality representatives together with partial projections 


A . 
Al B Ai, (3) 
for all i € {1,...,n}. Again they can be specified in the same language as in 


listing 1.2, and can be stored physically, given that the modelling technique offers 
means to identify elements, e.g. primary keys in a database, position in an XML 
document, Uniform Resource Identificators (URIs) [5], etc. 

The alignment of models At, A?, and A? together with their commonalities 
is shown in fig. 2. Each cyan line represents a commonality representative and 
each line ends at the value under the respective projection. Some of the lines 
are binary, some ternary. In general, we would expect any arity, especially when 
the number n of model spaces increases. The complete contents of fig. 2 is called 
a comprehensive system: the cyan connections its commonalities and models 
A!l,..., A” its components. 
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Models A’ are typed over their metamodels, i.e. there are typing morphisms 
ti : AŻ + M’ which can be combined to one big typing of all components. This 
typing extends to A’ as well because elements a; and ap (j # k) of model 
components AJ and A* are relatable only if their types t;(a;) and t,(a,) are 
related via a representative w € M°. Hence, a natural typing to of a commonality 
representative v of aj and a, is to(v) := w, such that 


pj (to(v)) = pj" (w) = ty (aj) = ty(p}'(v)), (4) 


which shows that the typing extension to integrates smoothly (respecting com- 
monalities) into a typing of all parts of the comprehensive model, such that we 
end up with a single typed comprehensive system: t: A> M. 


5.4 Reusing Methods of Local Model Management 


Consider the OCL example and its generalization in terms of diagrammatic 
constraints in sect. 5.1. Theorem 1 in sect. 6 will show that comprehensive systems 
constitute a category basically with the same properties as the base language B. 
Especially, pullbacks can be computed in a similar way, see Corollary 1 in sect. 6. 
Thus, we can define the consistency rules CR1-CR4 from sect. 3 as diagrammatic 
constraints (@;)je{1,...,44, ROW imposed on the comprehensive metamodel, which 
treat the commonality witnesses and projections as regular nodes and edges. 
Local constraints can be encoded as global constraints as well [24], such that 
we obtain comprehensive system Mo with a set ® of constraints spanning 
local model elements but also elements of the linguistic extension. Any typed 
system t : A — M can then be checked against a constraint ¢ imposed via 
scope d : Sy — M by pullback of d and t in the category of comprehensive 
systems, see Theorem 1 in sect.6. Hence, query implementation by pullbacks 
carries over from local models to comprehensive systems and we can reuse the 
theory of diagrammatic constraints to verify global consistency, which e.g. can 
be implemented by a straightforward translation of a respective model fragment 
and constraint to Alloy [20]. This can be used to formally verify that Fig. 2 is 
consistent w.r.t. CR1-CR4. 


5.5 Advantages over Model Merge 


A merged model is an artifact which is computed additionally from local models 
A’. Basically, it is the union of all elements of the A‘’s modulo their commonalities, 
see fig. 1. E.g. in the merge of models A!, A?, A? in fig. 2 there remains a single 
node, say Diag/descr of type Var (a type in M°, see fig. 5), which represents 
sameness of Diagnosis € A!, description € A? and diagnosis € A’. 

We could implement global consistency rules on the merge by including 
the merge computation in the check-function as described in the algorithm in 
[24]. However, this leads to problems if the verification of a global constraint 
depends on the knowledge of containment in local models. This can be seen with 
consistency rule CR3 which relies om the containment of elements (in this case 
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containment in A? and AÌ). After merging Diagnosis and description into the 
single node Diag/descr, distinguishing its original local model would no longer 
be possible. In contrast, we do not loose this differentiation in comprehensive 
systems and can successfully check the validity of this constraint. 


6 Categorical Formalization 


This section is devoted to the formalization of comprehensive systems from sect. 5. 
In order to relate comprehensive systems to the TGG framework we need to 
employ category theory (CT) because TGGs are usually formulated in terms of 
CT. We recall the central terminology in the following section and refer to the 
introductory textbooks [4,34,50] for further references about CT. 


6.1 Theoretical Background and Notation 


A category C is a collection of mathematical objects and of morphisms, which 
are means to compare objects. For a category C, the set of objects is denoted 
|C| and for each pair A,B € |C| the (hom-)set of morphisms from A to B is 
denoted by Arrc(A, B). For each object A € |C| there exists a special identity 
morphism id, : A — A. Moreover there is a neutral and associative composition 
operation o : Arrc(A, B) x Arrc(B,C) > Arrc(A,C) for all A, B,C € |C|. The 
most prominent example is the base language of mathematics: Set, the category 
of sets and total mappings. A category C is said to be small, if |C| is itself a set. 
Equivalence of two categories C and D, written C = D, means that the network 
of objects and morphisms in C is identical to the one in D up to isomorphisms 
(e.g. bijections in Set) between objects. 

A functor provides the means to compare two categories C and D: It is 
denoted F : C — D and maps objects of C to objects of D and morphisms 
of each set Arrc(A, B) to Arrp(F(A), F(B)). Moreover, it preserves identities 
and composition. F is called an embedding, if it is injective on objects of C 
and injective on Arrc(A, B) for all A,B € |C]. For fixed categories C and 
and functors F,F’ : C > D, a natural transformation n : F = F' is a family 
(na : F(A) > F'(A)) Acjcj of D-morphisms compatible with images of F and F’, 
i.e. for all C-arrows f : A > B: ngoF(f) = F’(f) ong. In such a way we get 
a new category, the functor category D© with objects all functors from C to 
and arrows the natural transformations. Functors F : C — Set where C is small 
play a special role: F assigns to each S € |C] a (carrier) set F(S) and for every 
op E€ Arrc(S, S’) a mapping F(op) : F(S) > F(S’), i.e. C is a signature (think 
metamodel) that is interpreted by F (think instantiated). Hence, this is also called 
functorial or indexed semantics and Set© corresponds to the class of algebras 
for a signature C (instance worlds for a metamodel). E.g. objects of G := Set® 
are E-Graphs, if B is the category depicted in fig. 4b (identities are omitted) and 
E-Graph-homomorphisms are exactly the natural transformations. For set-based 
structures, we use the notation A —> B to indicate included structures (A in B) 
such as subsets or subgraphs. 
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Universal constructions in categories have proven to be of importance in 
many software theoretical methods. Intuitively universal constructions can be 
described as a generalization of meets and joins in a preorder. Some well known 
examples for universal constructions in Set are cartesian products or disjoint 
unions (coproduct). It is important to note that Set possesses all these universal 
constructions and thus every category Set? does as well, where the computation 
of universal constructions is carried out “pointwise”. 


6.2 Comprehensive System 


We begin the formalization of comprehensive systems by fixing a sufficiently 
large natural number n and considering a synchronization scenario with model 
spaces (Mod(M5, ))je{1,....n}: E.g. UML class diagrams, BPMN process models 
and DMN tables. 


Definition 1 (Base Modelling Language). The base modelling language is 
a small category 


In order to distinguish between the different system components, we will work 
with copies Bj of B. We let |B;| = {s; | s € |B|} and similarly op; : sj — s} be 
an arrow in Arrg,, if op: s — s’ is an arrow of Arrg.! 


Definition 2 (Comprehensive Systems, Components, Commonalities). 
A comprehensive system C consists of 


— Functors Cj : Bj — Set for each j € {1,...,n}, called Components 

— A functor Co : Bo —> Set determining the Commonality representatives, and 

— A collection of partial functions (Co(s) 28 C;(S))se[B|,1<j<n, called projec- 
tions, establishing the commonalities of C, 


such that for all op: s > s € B and 1 < j < n the following statement holds: 


If pj, (£) is defined, then p;,s(Co(opo)(x)) is defined (5) 
and pj,s'(Co(opo)(x)) = C; (op;)(Pj,s(2))- (6) 


Note that (5) and (6) generalize the edge-node-incidences, see sect. 5.2, which 
we already semi-formalized in (1). In the sequel, the index of functors C; will 
be omitted, since it can be derived from the domain of definition. Hence, a 
comprehensive system is a single functor C with domain the n + 1 copies of 
and (n + 1)b carrier sets, if b is the cardinality of |B|: In view of the introductory 
remarks on functors in sect. 6.1, Co,...,C, can be seen as n + 1 instance worlds 
for metamodel B, e.g. E-Graphs, each with b = 4 carrier sets. 

The fundamental linguistic extension are the partial functions. They act 
according to our example in sect. 5.2: In the tuple (p;(w),...,pn(w)) the pj 
determine sameness of its components based on representative w. 


The abbreviation “op” for arrows of the base shall indicate that B-arrows are certain 
operations constituting the structure of the base language, such as source and target 
operations of edges in graphs. 
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The next definition deals with different comprehensive systems. In this case, 
it is necessary to tell the respective partial mappings apart, such that we write 
PSs, if we depict the mappings in the particular system C. 


Definition 3 (Homomorphisms between Comprehensive Systems). Let 
C,C" be comprehensive systems as defined in Def.2. A homomorphism between 
comprehensive systems is a family 


(fis : C(si) > C"(8:))se[Bl o<i<n 


of mappings compatible with arrows, i.e. Vi € {0,...,n},Vop:s > s € Arr: 
fo C(op;) = C’(op;) o f, and compatible with partial mappings: For all j € 
{1,...,n}, s € |B| and x € C(so): 


If ps (2) is defined, then p7 s(f(2)) is defined and PF s(f(2)) = F@Fs(2)) (7) 
where we write f instead of fjs, if the indexing becomes clear from the context. 


A typical example is a typing morphism t : A — M for two comprehensive 
systems A and M. Then equation (7) reflects property (4), i.e. compatibility of 
commonalities and typing. This can be seen in fig. 2: The complete contents of it 
is a comprehensive system A typed over the comprehensive metamodel M partly 
depicted in fig. 5. A? consists of all cyan (binary or ternary) lines and p; s assigns 
to a line its line end in model A’, where s is the respective element type (node 
or edge). 


Proposition 1. Comprehensive Systems together with homomorphisms between 
them constitute a category CS. 


Proof. An identity is a family of identities, composition is composition of map- 
pings fjs- This yields neutrality and associativity. Moreover, composed homo- 
morphisms are still compatible with arrows. Whereas this follows in the usual 
way for op: s + s’, transitivity of the definedness implication in (7) also yields 
compatibility with partial functions. 


6.3 Multimodel Equivalence 


An alternative but closely related approach to our construction is to consider 
commonalities, i.e. commonality representatives A° together with projections 
(pe )i<j<n not represented internally by means of the modelling technique but 
externally as n spans of morphisms [24,46]. Let for this G := Set®, see the remarks 
on functor categories in sect. 6.1. The resulting artifacts of the category in [46] is a 
subcategory M of the functor category G', where I is defined as in fig. 6 (identity 
arrows of Il are again omitted). It is a subcategory, because it only consists of 
those functors M : I > G, for which the images M(—7) of the top arrows in fig. 6 
are monic (i.e. are monomorphisms). 
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The proof of the following theorem relies mainly on 0 
cartesian closedness of the category of small categories, Po bi 
i.e. G! S Set®*! (internalization) and the fact that spans 1 a 
with one monic leg represent partial mappings, the middle F B 
object of the span being the domain of definition of the i lz 

n 


partial map. A detailed proof of the theorem is given in 
[47]. 


i Fig. 6. Cat I 
Theorem 1 (Equivalence of Categories). CS = M. ii TRER 


Corollary 1. CS possesses all pullbacks and they are computed separately for 
the commonality representatives and for each component. 


Proof. Follows from Theorem 1 and the fact that functor categories possess all 
pullbacks, their pointwise construction guaranteeing that spans with one monic 
leg are preserved, because pullbacks preserve monomorphisms. 


Auxiliary commonality structures have been used for model synchronization 
in the TGG framework [40]: Consistency relations between two model spaces 
are defined declaratively by a grammar. The grammar rules are defined over 
triple graphs, i.e. pairs of graphs connected by special correspondence-graphs, 
which resemble structural commonalities. From the grammar rules, procedures for 
consistency verification |27], model transformation [13] and (concurrent) model 
synchronization [19,18] can automatically be derived. The solution space, however, 
is limited to binary scenarios. Trollmann and Albayrak [48,49] generalized the 
TGG framework to cope with multiple models within a graph diagram (GD) 
framework. If we assume that the involved models are also objects of the graph- 
like category G (see above), then graph diagrams are the objects of a functor 
category G*, but with a different schema category X: It has objects |X| = RUN 
and all non-identity morphisms connect a source from R (relations) to a target 
from N (models). There is at most one arrow in Arrx(r,m) for fixed r € R and 
m € N. In such a way graph diagrams, i.e. functors D : X > G can specify 
relations of different arities. 

They are, however, static: If r € R has k outgoing morphisms with targets 
Mi,- Mk € N, D(r) is a k-ary correspondence relation with representatives 
which relate exactly one element in each of the k models D(m,). Consequently, 
the schema category has to change each time a new relation is added! 

Graph diagrams (GD) subsume TGGs, which have schema Xrpgq := 1 Ê 
o> 2, i.e. R= {0} and N = {1,2}. Computations of triple graphs (and graph 
diagrams) during rule application as well as decomposing GD rules for forward 
and backward transformations are based on pushout constructions in G*. In 
the rest of the section we show that our framework is more general than graph 
diagrams in that there is an embedding functor T : G* > CS, the translation 
functor, which preserves pushouts and hence is able to replay all GD computations 
in our framework, yet being able to cope with new relations without changing 
the schema category. 
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We use the following notations: For a morphism f : A > B in a category C we 
write A = dom(f) and B = codom(f) for its domain and codomain and we use 
the shorthand notation Arre(_, B) := {f € Arre | codom(f) = B}. We write 
Ilie; Di to depict the coproduct of a collection (D;)ier of G-objects. Note that a 


collection (D; A D) ier of morphisms yields the morphism J J;e; fi: Lier Di > D 
by the universal property of coproducts, i.e. the morphism, which acts as f; on 
each D;. 

By Theorem 1, it suffices to define a functor from G* to M. The composition 
of this functor with the equivalence will yield the desired result. This functor 
will also be called T. Let a schema category X for graph diagrams be given with 
|X| = RW N and let n be the cardinality of N. Without loss of generality, we 
assume N = {1,...,n}. Let D be a graph diagram, then we define a multimodel 
M := T(D) intuitively as follows (recall the multimodel schema in fig. 6): The 
model components of N are the same as those of D, the commonality specification 
M(0) is the disjoint union of all relations in D, the middle objects M(—J) are 
the union of those relations, the model D(j) participates in: 


M(j) := D(j) (Models are untouched) 
M(0) := J [ er D(r) (Coproduct of all relations) 
M(—J) = Jl jearrg( j D(dom(f)) (Participating Relations of D(7)) 


for all j € {1,...,n}. Furthermore, 


M(j) = HMjearr a DO) (Projections) 
M(=3) : Lypearre¢_,j) D(dom(f)) > Ler D(r) (Domains) 


Hence projections M (j) are the unions of the domains of those relating morphisms 
that have target D(j) and inclusions arise from the fact that coproducts in the 
above definition of M(—j) (taken over some relations) are always subgraphs of 
the complete coproduct M(0) (which is taken over all relations). 

The definition of T on arrows is straightforward 


and we give it only informally: If n : D > D’ isan M° ——+—, M! 
arrow between graph diagrams, then (1) T(n); is a 

morphism which acts in the same way as n; on D(i), if  “ | | 
i > 0, (2) it amalgamates the actions of n on relations, y2 ———“__.. My3 


if i = 0, which (3) naturally restricts to the respective 
actions, if i < 0. It is then easy to see, that v := T(n) 
is again a natural transformation. Fig. 7. Pushout in M 


Theorem 2. Functor T : G* — CS is an embedding and preserves pushouts. 


For a detailed proof of this theorem consult [47]. To sketch the idea, note 
that we cannot rely on pointwise pushout construction alone: Given a span (v, u) 
in M as in fig. 7, pointwise pushout construction may fail to belong to M! E.g. 
if v and p are arbitrarily given, then M? in fig. 7 may not be admissible for M 
because the mapping M°(—7) may fail to be monic, an effect already studied in 
[25, Example 6] 
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Instead the proof uses the fact that naturality squares in v are pullbacks, if v 
is in the image of T. Then hereditariness [17] of pushouts in G yields admissibility 
of M? and nevertheless allows for pointwise pushout construction. We obtain as 
a consequence: 


Corollary 2. Every sequence of rule applications in G* has a unique represen- 
tation of corresponding rule applications in CS and hence can be replayed in the 
general framework of comprehensive systems. 


7 Conclusion, Related Work and Future Plans 


Our work can be summarized by the slogan “from many models to one model”: 
Multimodelling is addressed by a construction that yields a single artifact, where 
existing means for consistency verification and restoration can be reused. Over 
many years such global artifacts were computed via merging [38,6,36,10], which 
poses several difficulties especially if the verification of a global constraint de- 
pends on the knowledge of which local model the elements came from. Hence, 
we proposed comprehensive systems that mitigate issues with the former and 
represent a generalization of graph diagrams and triple graphs—alternatives to 
our approach. Comprehensive systems stress the utility of partial mappings in 
commonality specifications, which have been promoted in [46] and were also 
picked up in [25]. 

Related work on multimodel consistency management was surveyed in sect. 4. 
Thus, at this point we mainly want to place our contribution in this landscape. 
Our approach can be considered as a structural one and is in tradition with other 
approaches based on traceability links. Recent other representatives in this line 
are [16], which uses binary links to relate different artifacts in a practical scenario, 
and [21], which develops a language, similar to ours, for expressing commonalities 
for global consistency restoration. All these works share the requirement for a 
common meta-metalanguage: In our case, given by graph-like structures (presheaf 
topoi). A rather different approach is the framework proposed by Stevens [45]: 
It considers consistency restoration to be performed locally by a builder. The 
concrete implementation of the builder is up to the user and thus there is 
no requirement for a common meta-metalanguage. The global coordination of 
multiple builder is handled by the framework, controlled by an orientation model. 
Comparing Stevens approach to structural approaches, the former is more abstract 
and thus allows more directions for tooling implementation, whereas structural 
approaches allow formal analysis of the nature of consistency rules. It will be 
worthwile to investigate the relationship between both approaches in the future. 

This paper provides the framework for performing multi model consistency 
management by reusing existing restoration techniques. We plan to address the 
momentary lack of practical evidence by investigating model repair [28] as the 
next step. Being conceptually close to TGGs, grammar-based approaches seem 
a natural fit but we plan to experiment with solver-based approaches as well, 
further taking into account: Human interaction and learning. 
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Abstract. Refactoring is often needed to ensure that software systems 
meet their performance requirements in deployments with different oper- 
ational profiles, or when these operational profiles are not fully known or 
change over time. This is a complex activity in which software engineers 
have to choose from numerous combinations of refactoring actions. Our 
paper introduces a novel approach that uses performance antipatterns 
and stochastic modelling to support this activity. The new approach 
computes the performance antipatterns present across the operational 
profile space of a software system under development, enabling engineers 
to identify operational profiles likely to be problematic for the analysed 
design, and supporting the selection of refactoring actions when per- 
formance requirements are violated for an operational profile region of 
interest. We demonstrate the application of our approach for a software 
system comprising a combination of internal (i.e., in-house) components 
and external third-party services. 


1 Introduction 


Performance antipatterns [8,31] and stochastic modelling (e.g., using queueing 
networks, stochastic Petri nets, and Markov models [7,16,33]) have long been 
used in conjunction, to analyse performance of software systems and to drive sys- 
tem refactoring when requirements are violated. End-to-end approaches support- 
ing this analysis and refinement processes have been developed (e.g., [4,9,20]), 
often using established tools for the simulation or formal verification of stochastic 
models of the software system under development (SUD). 

While these approaches can significantly speed up the development of sys- 
tems that meet their performance requirements, they are only applicable when 
the SUD operational profile is known and does not change over time. Both of 
these are strong assumptions. In practice, software systems are often used in 
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applications affected by uncertainty, due both to incomplete knowledge of and 
to changes in workloads, availability of shared resources, etc. 

In this paper, we introduce a novel performance analysis and refactoring 
approach that addresses this significant limitation of current solutions. The new 
approach considers the uncertainty in the SUD operational profile by identifying 
the performance antipatterns present in predefined operational profile regions. 
These regions capture aleatoric and epistemic operational profile uncertainties 
due to unavoidable changes in the environment (e.g., workload variations) and to 
insufficiently measured environment properties (e.g., CPU speed), respectively. 

A few existing solutions [2,11,19] employ sensitivity analysis to assess the 
robustness of software to variations in its operational profile. However, these so- 
lutions are not interested in major operational profile changes like our approach, 
and therefore focus on establishing the effect of small operational profile varia- 
tions on the performance of the SUD. In contrast, our new approach provides a 
global perspective on the performance antipatterns associated with a wide range 
of operational profiles. This perspective enables software engineers to identify 
operational profile regions in which their SUD is likely to require refactoring, 
and supports the selection of suitable refactoring actions for such regions. The 
main contributions of this paper are: 


1. We introduce the concept of a performance antipattern profile (i.e., a “map” 
showing the antipatterns present in different regions from the operational 
profile space of a SUD), and a method for synthesising such profiles for 
systems comprising a mix of internal and external software components. 

2. We present a tool-supported approach that uses our performance antipattern 
profile synthesis method, and we define best practices for refactoring the 
architecture of a SUD using performance antipattern profiles. 

3. We demonstrate the application of our approach for a software system com- 
prising a combination of internal (i.e., in-house) components and external 
(i.e., third-party) services. 


The remainder of the paper is organized as follows. Section 2 introduces a 
software system that we use to illustrate the application of our approach through- 
out the paper. Section 3 presents the new approach for the performance analysis 
and refactoring of software systems, and Section 4 describes its application to 
the service-based system from our motivating example. Section 5 compares our 
solution with existing approaches. Finally, Section 6 summarises the benefits and 
limitations of our approach, and suggests directions for future work. 


2 Running Example 


To illustrate the application of our approach, we consider a heterogeneous soft- 
ware system comprising both internal components and external services. We 
assume that the internal components are deployed on the private servers of the 
organisation that owns the system. As such, the architecture and resources of 
these components can be modified if needed. In contrast, the external services 
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are accessed remotely from third-party providers and cannot be modified. These 
services can only be replaced with (or can be used alongside) other services that 
are functionally equivalent but may induce different performance. 


2.1 System description 


The system we use as a running ex- 

ample is adapted from [14], and comes 

from the foreign currency trading do- a dl See = 
main. The workflow implemented by 
this “FOREX” system is shown in 
Figure 1, and involves handling re- 
quests sent by currency traders. Two 
types of requests are possible: requests Analysis 
that must be handled in a so-called [objectives y objectives perform 
“expert” mode, and requests handled nomee > satisfied <a 
in a “normal” mode. The request type iid end 


Fundamental 
Analysis 


determines whether the system starts nan Order 
with a “fundamental analysis” opera- 
tion or a “market watch” operation. Notification 


Both of these operations use exter- 
nal services. “Technical analysis” is 
an operation provided by an inter- 
nal component. This operation fol- 
lows the market watch, and deter- Fig. 1. Workflow of the foreign currency 
mines whether the trader’s objectives trading system (FOREX) 

(specified in the request) are satisfied 

or not. If there is a conflict between these objectives and the results of the tech- 
nical analysis, then the market watch is re-executed. Furthermore, the technical 
analysis may return an error, i.e., an internal “alarm” operation is triggered 
to inform the user about the erroneous result. The optimal results of either 
technical or fundamental analysis (satisfied objectives/trade acceptance) lead to 
the execution of an external “order” operation that completes the trade, and 
is followed by an internal “notification” operation that confirms the successful 
completion of the workflow. 


2.2 External services 


For the operations executed using external services, multiple services can be used 
as equivalent alternatives or in some combination deemed suitable. Given n > 1 
functionally equivalent services, three options for combining them are possible: 


— Sequential (SEQ): first invoke service 1; if the invocation succeeds, use its 
response; if it fails, then invoke service 2, etc., until service n is invoked, if 
needed. 
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— Parallel (PAR): invoke all n services at once, and use the first result that 
comes back. 

— Probabilistic (PROB): invoke one of the n available services, selected based 
on a discrete probability distribution. 


Therefore, we need to choose a “good” option (i.e., one that enables the system 
to satisfy its performance requirements) starting from information about the 
performance characteristics shown by each of these services, which we assume 
known from either the service-level agreement (SLA) published by the providers 
of these services, from our observations, or from both. Additionally, we assume 
that all these services already satisfy the functional requirements. 


2.3 Internal components 


The internal operations are executed by software components belonging to the 
organisation that “owns” the system, and running on their private hardware 
nodes/servers. We assume that technical analysis (TA) has a much more signif- 
icant impact on the performance of the system compared to the other two in- 
house components (alarm and notification), which require only modest resources. 
Consequently, it is necessary to identify possible antipattern-driven refactoring 
actions for the TA component, to ensure that the system operates with an op- 
timal performance. If and when needed, the refactoring actions we consider are: 
(i) duplicate the TA software component and load balance the incoming requests 
among the two TA instances; or (ii) replace the TA instance with a faster one. 
These actions will increase the cost, but may be needed to satisfy the perfor- 
mance requirements of the system. 


2.4 Operational profile parameters 


Several parameters of the system are outside the control of its developers. These 
parameters represent the operational profile of the system. For our FOREX sys- 
tem, they include the probability that a user request needs expert-mode han- 
dling, and the probability of a transactions being performed after the execution 
of the fundamental analysis operation (cf. Figure 1). The choice of these param- 
eter ranges reflects, for instance, the engineers’ expectation about a particular 
deployment of the system, numerical values will be provided in Section 4. 


3 Approach 


3.1 Overview 


As shown in Figure 2, our approach to performance analysis and system refac- 
toring comprises five steps. Starting for an initial system design proposed by 
a software engineer, step 1 involves modelling the performance characteristics 
of the system across its entire operational profile space (i.e., for all possible 
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Fig. 2. Performance analysis and refactoring using antipattern profiles 


values of the operational profile parameters). As such, the performance mod- 
els produced by the modelling step are parametric models—models containing 
(uninstantiated) parameters like the probabilities of receiving different types of 
user requests. Our approach is not prescriptive about the type of performance 
models that can be used in its modelling step. However, these models must 
be able to capture the uncertainty associated with the operational profile of 
the system. Therefore, in this paper we will use parametric discrete-time and 
continuous-time Markov chains (parametric DTMCs and CTMCs). 


Step 2 of the approach instantiates the parametric performance models for 
combinations of parameter values covering the entire operational profile space. 
A suitable discretization of the continuous parameters is used for this purpose. 


The performance models are then analysed in step 3 to compute the perfor- 
mance indices corresponding to all considered combinations of operational profile 
parameter values. Existing analysis tools suitable for the adopted type of perfor- 
mance models need to be used in this step—in the case of our DTMC and CTMC 
models, a probabilistic model checker such as PRISM [24] or Storm [18](¢). 


Step 4 of the approach is using the performance indices and a portfolio of an- 
tipattern detection rules to identify the performance antipatterns that occur for 
different combinations of parameter values. This step produces a series of maps 
that show the distribution of such antipatterns across the operational profile 
space, thus to highlight problematic (from a performance viewpoint) areas. 


Finally, step 5 assesses whether refactoring actions are required, because 
performance antipatterns occur in regions of the operational profile space where 
the deployed system is expected to operate. When refactoring is required, suit- 
able refactoring actions (selected from a repository of such actions) are used to 
update the system design. Updated system designs are then further evaluated 
through re-executing the five steps of the approach, until a design with suitable 
performance antipattern profiles is obtained. 


1 An estimation of the effort required to create and solve performance models is out 
of this paper scope, as it may depend on the application domain complexity and the 
analysts’ expertise. 
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Table 1. Detection rule parameters. 


Variable Scope Description 
InvReq EXT/INT Number of invocations per request 
AvgInvReq |EXT/INT Average number of invocations per request 
InvTime EXT/INT Number of invocations per time unit 
AvgInvTime |EXT/INT] Average number of invocations per time unit 
ServRate INT Service rate 
Util INT Utilization 
AvgUtil INT Average utilization 
UtilThresh INT Fixed utilization threshold 
RespTime EXT Response time 
AvgRespTime EXT Average response time 
PathProb EXT/INT Probability of path execution 
AvgPathProb |EXT/INT Average probability of path execution 
PathProbThresh|EXT/INT|Fixed threshold for probability of path execution 


3.2 Detection rules 


The concept of Performance Antipattern has been introduced several years ago 
[31] to define bad design practices that can induce performance problems in soft- 
ware systems. This concept has been later formalized in First-Order Logics [17] 
and then employed, in the context of Software Performance Engineering pro- 
cesses, for the purpose of automating the detection and solution of performance 
problems [29]. 

Inspired from the formalization provided in [17], we have here bounded the 
detection rules of three performance antipatterns to the modeling and analy- 
sis context of this paper. This binding is indeed required for any context, due 
to specificities and possible limitations of the notations adopted. In our case, 
Markov models of service-based software systems, on one side, offer the advan- 
tage of easy deduction of stochastic indices and, on the other side, suffer of lack 
of separation between software and hardware parameters. The latter are in fact 
implicitly taken into account in execution rates of operations. 

Hereafter we report the formalization of the performance antipattern detec- 
tion rules that we have used in this paper, while their parameters are defined in 
Table 1, where we also specify whether each parameter is available for external 
services (‘EXT’), for internal components (‘INT’), or for both ((EXT/INT’). 


- BLOB 

General description 
It occurs when a component performs most of the work of an appli- 
cation, thus resulting in excessive components’ interactions that can 
degrade performance. 

Internal components 
(InvReq > AvgInvReq) A^ (Util > UtilThresh) A (Util > AvgUtil) 

External components 
InvReq > AvgInvReq 
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- CONCURRENT PROCESSING SYSTEMS (CPS) 

General description 
It occurs either when too many resources are dedicated to a compo- 
nent (MAX) or when a component does not make use of available 
resources (MIN). 

Internal components 
MAX - (Util > UtilThresh) A (Util > AvgUtil) 
MIN - (Util < UtilThresh) A (Util < AvgUtil) 

External components 
MAX - PAR pattern ^ (RespTime > AvgRespTime) 
MIN - PAR pattern A (RespTime < AvgRespTime) 


- PIPE AND FILTER (P&F) 
General description 
It occurs when the slowest filter in a “pipe and filter” architecture 
causes the system to have unacceptable throughput. 
Internal and External components 
(InvTime > AvgInvTime) ^ (PathProb > PathProbThresh) ^ 
A (PathProb > AvgPathProb) 


We remark that, in our context, the rules for detecting a specific antipattern 
on internal components may differ from the ones defined for external services. 
This is because the parameters available for external services are obviously more 
limited than those of the internally developed components. For example, the 
whole response time (i.e., service plus waiting time) of an external service is 
usually negotiated in a service-level agreement, but it is difficult to isolate the 
net service time contribution to it, due to lack of control on the execution plat- 
form and the amount of resources dedicated to the service by the provider. Both 
indices can instead be estimated for internal components. As a consequence, 
wherever the service time (or any derived index like utilization) appears in a 
detection rule, the corresponding predicate has to be skipped/modified for ex- 
ternal services. For this reason, in our case BLOB and CPS antipatterns present 
different rules when applied to internal components or external services because, 
as reported in Table 1, utilization cannot be estimated for the latter ones. In 
the BLOB case, the predicates including utilization for internal components are 
simply skipped in the external service formulation, because no other predicate 
would make sense there. Instead, in the CPS case, the predicates on utiliza- 
tion have been replaced with similar ones on response time for external services, 
because the CPS definition is compliant with this modification. 

We highlight that all predicates include parameters that evidently change 
across different areas of the system operational profile (e.g., InuReq, Util), 
hence we expect that the occurrences of the corresponding antipatterns vary 
consequently. The only exceptions are the CPS rules for external services, be- 
cause their parameters and thresholds do not depend on the operational profile. 
Such rules refer to the response time that, for these components, is based on ser- 
vice level agreement, and thus it cannot vary with the operational profile. This 
will evidently reflect on our experimental results, where CPS on external services 
will appear either everywhere or nowhere in the operational profile space. 
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3.3 Synthesis of antipattern profiles 


The more software applications are being used worldwide from different types 
of users, the more difficult is to estimate a representative average behavior of 
users that induces a specific operational profile. In fact, not only users can have 
different operational profiles depending on their locations [15], but even in the 
same area the users behavior can (sometime radically) change over time [23]. 

Nevertheless, applications should show acceptable performance across differ- 
ent operational profiles. A motivation for our work is that different operational 
profiles can induce various performance problems, for example because a higher 
execution frequency of a path can overload components involved in that path. 
Hence, the idea is that, in order to identify the most appropriate refactoring 
actions to apply for overcoming performance problems, these problems must be 
identified across different operational profiles. 

In this paper, we introduce the con- 
cept of Performance Antipattern Pro- 
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tipatterns. Only with this information in hand, the performance experts can 
suggest appropriate refactoring actions when the system falls within a certain 
operational profile area, or even (in a proactive way) when the system is expected 
to enter a specific operational profile area. 
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3.4 Refactoring 


The notational aspects outlined in the previous section for antipattern detection 
obviously reflect in the portfolio of refactoring actions aimed at removing per- 
formance antipatterns. In general, a refactoring action modifies some available 
architectural knob (e.g., the number of messages exchanged between two com- 
ponents, the list of operations provided by a component) to remove a source of 
the antipattern causes. The type and number of knobs depend on the adopted 
notation, so the portfolio of refactoring actions does the same. 

Our notation distinguished between internal components and external ser- 
vices. The two types of system elements are characterized by a few common 
parameters and by parameters specific to each type (see Table 1). Therefore, 
our portfolio of refactoring actions is partitioned in two sets, as detailed below. 
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Actions for internal components 


— Change service rate - The modification of a component service rate can 
be induced by several actions on the system, which could act on the hardware 
platform or on the software architecture, such as: (i) redeploy the component 
to a platform node with different hardware characteristics, (ii) replace some 
devices of the platform node where the component is currently allocated, 
(iii) redesign the software component so that its resource requests change, 
(iv) split a component into two (or more) components and re-deploy them. 

— Change number of threads - This action is always possible where the con- 
trol on the number of threads is on the designer’s hands, and indeed for 
internal components this is guaranteed. 


Actions for external services 


— Change pattern - We have considered three combination patterns for ex- 
ternal services, that are: SEQ, PAR, and PROB (see Section 2.2). They are 
used to combine (a subset of) the available instances of a certain external 
service. This action requires to modify the combination pattern, by keeping 
unchanged the set of combined services. 

— Change the pattern parameters - Some patterns are regulated by param- 
eters, in particular: PROB has a probability of each instance invocation, and 
SEQ has a failure probability for each instance. A change in the PROB prob- 
abilities is always feasible, because they are under full control of the designer. 
Instead, a change in the failure probabilities within a SEQ pattern implies 
that the designers are enabled for deeper modifications in the involved in- 
stances that can induce different reliability, and this is not often the case. 

— Change combination of service instances - This action requires to re- 
place some (or all) of service instances that are combined to provide a certain 
operation, by keeping unchanged the combination pattern. 


Of course, the above actions can be combined together to study their joint 
effects on the performance improvement. 


4 Evaluation 


In this section, we first introduce the research questions that we intend to ad- 
dress (see Section 4.1). Thereafter, we describe the experimental scenarios (see 
Section 4.2) and discuss the obtained results (see Section 4.3). We finally re- 
port the threats to validity in Section 4.4. The implemented tool, the mod- 
els and the experimental results are available at: https://github.com/Fase20/ 
automated-antipattern-detection. 


4.1 Research questions 


The detection and solution of performance antipatterns largely depends on the 
operational profile, which is determined by the end-users behaviour, thus it can 
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only be known after the system deployment. Naturally, some antipatterns are 
more affected than others by the operational profile that can have a considerable 
influence on the software system and, consequently, on its performance charac- 
teristics. Through our experimentation, we aim at answering the following two 
research questions: 


— RQ: Does our approach provide insights on the performance antipattern 
profile of a specific design? 


— RQz2: Does our approach support performance-driven refactoring decisions 
on the basis of the performance antipattern profile? 


In order to answer these questions, we apply our approach to the running 
example introduced in Section 2. 
4.2 Experimental scenarios 


Table 2 reports the system parameters of Table 2. System parameters. 
the default configuration we have used for 


our experiments. It is structured in three Parameter Values 
different groups. First, system settings, i.e., ExtReqs-rate 10s~! 
ExtReqs-rate (rate of external requests in- QueueSize 10 
coming to the system), and QueueSize (max- TAa a 
imum number of queueing requests). These = 
Alarm-rate 40s 
values are both set to 10. Second, the rate - — 
: ; Notif-rate 55s 
of internal components and external services, MW aai isonet 
e.g., TA-rate = 3 is the execution rate of the ees sale 
; : ; FA-rate 24.99571 
Technical Analysis (TA) internal component. k 
Order-rate 19.09s~ 


For external services, this rate corresponds to 
the inverse of the response time (as explained |TA-threads 1 
in Section 3.2), and it was obtained through 
the analysis of discrete-time Markov chain (DTMC) models of the service com- 
binations (i.e., SEQ, PAR or PROB) used for the external operations of the 
system. The model checker Storm was used to perform this analysis. Third, TA 
(as internal component) has a number of threads that is initially set to 1, but 
we provide a refactoring action that can change such number to modify the 
parallelism degree for such component. 

The operational profile space of our running example (see Figure 1) is fully de- 
fined by the following branching point probabilities: (i) pExpertMode (pgm), i.e., 
the probability of executing the workflow in expert mode; (ii) pPerformTransac- 
tion (ppr), i.e., the probability of successfully performing a transaction; (iii) 
pObjectivesSatisfied (pog) and pObjectivesNotMet (pon), i.e., the proba- 
bilities of satisfying or not the objectives, respectively. As a consequence, 1 — 
(pos + pon) is the resulting probability of an error occurring. 

The experimental scenarios that we analyze in the next section include the 
variations of pam and ppr within their full range [0,1] with a 0.1 step. Given 
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the space constraints, we decided to bind (pos, pon) to three scenarios, namely: 
{ (0.21, 0.78), (0.48, 0.01), (0.98, 0.01)}, which in the following we call scenario,, 
scenariog, and scenariog, respectively. 

We have considered the following design changes for refactoring purposes: 
(Rı) - the service rate of the TA internal component can be modified from 3 
to 6 jobs per second (i.e., it becomes faster when performing computations) 
when TA is detected as an instance of a BLOB performance antipattern; (R2) 
- a further thread of the TA component can be added to split the incoming 
load and manage users’ requests, again as a solution of a BLOB performance 
antipattern on TA; (R3) - change pattern (from SEQ to PAR) and service rate 
(from 50.21 to 500) of the MW external service, when MW has been detected as 
part of a Pipe and Filter antipattern; (R4) - change service rate (from 40.02 to 
400) of the FA external service while keeping the same pattern (i.e., PAR), and 
this is suggested as a solution of a Pipe and Filter antipattern that involves FA. 

The results presented in the next section were obtained using the tool we 
developed to implement the analysis and refactoring process from Figure 2. 
This tool generates antipattern profiles using the antipattern detection rules 
from Section 3.2 and performance indices computed through the probabilistic 
model checking of a continuous-time Markov chain (CTMC) model of the entire 
FOREX system from Figure 1. The model checker Storm is automatically in- 
voked by the tool for this purpose. The tool and the parametric CTMC models 
we used are available in our project’s GitHub repository. 


4.3 Experimental Results 


In order to answer RQ, we have investigated the occurrence of performance 
antipatterns across different operational profiles, so to obtain performance an- 
tipattern profiles. Figures 4, 5, and 6 report the BLOB, CPS, and P&F detected 
antipatterns, respectively, across the operational profile space. Each figure shows 
the three considered scenarios for pos and pon and, for each scenario, pgm varies 
from 0 to 1 (with a step size of 0.1) on the x-axis , while ppr varies in the same 
range on the y-axis. Antipatterns occurring in each operational profile point are 
denoted by specific symbols. 

We have here considered full ranges of the operational profile parameters, 
even though, in each instant of its runtime, the system will fall in a single point 
of the profile. Therefore, suitable refactoring actions depend on the area where 
the running system profile falls in the considered time. In particular, if it runs in 
an area where antipatterns do not occur, then no refactoring action is suggested. 

In Figure 4(a) we can notice that in scenario, (i.e., pos = 0.21 and pon = 
0.78) four different components are detected as BLOB antipatterns, specifically: 
(i) BLOB(FA) occurs for low values of pgm only (i.e., up to 0.2); as opposite, 
(ii) BLOB(TA) occurs for larger values of pgm; (iii) BLOB(MW) shows a very 
similar behaviour with respect to BLOB(TA) except in two corner cases where 
it occurs alone; (iv) BLOB(Order) occurs for low values of pgm and high values 
of ppr only. 
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Fig. 4. BLOB antipattern instances while varying operational profiles. 
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Fig. 5. CPS antipattern instances while varying operational profiles. 


Figure 4(b) interestingly shows that in scenariog (i.e., pogs = 0.48, and 
pon = 0.01), BLOB(TA) and BLOB(MW) occur in a smaller portion of the 
operational profile space, i.e., the right-most side (starting when pgæm= 0.7). 
Also the other antipatterns are subject to the probability changes, in fact both 
BLOB(FA) and BLOB(Order) occur in a larger portion of the space, i.e., the 
left-most side (up to pam=0.5). This is because scenariog moves a consistent 
part of the workload far from the MW-TA loop, with respect to scenario,. 

Figure 4(c) illustrates the case of scenariog (i.e., pos = 0.98, and pon = 
0.01), where further differences appear. In particular, BLOB(TA) antipattern 
does not occur anymore since the higher value of pos induces less computation 
in TA. BLOB(MW) is confined to three cases of large pgm values and low ppr 
values. This is because the major load is going here to FA and Order that in 
fact more widely are detected as BLOB antipatterns. 

Figure 5 depicts the CPS antipattern profile that, as compared to the BLOB 
one, does not considerably vary across different scenarios. For readability reasons, 
CPS(FA)min is not reported in this figure, although it occurs across the whole 
operational space for all the three scenarios. We recall that this is due to the CPS 
detection rule that takes into account the response time for external services, 
which does not change with users’ behaviour since it is a fixed value outcoming 
from service-level agreements. CPS(TA)min is not affected at all by the scenario 
variations, as it always occurs in the same operational profile area. Instead, 
the CPS(TA)max instances progressively decrease when increasing pos. A pos 
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Fig. 6. P&F antipattern instances while varying operational profiles. 
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Fig. 7. BLOB antipattern instances across different refactorings - scenarios. 


growth, in fact, relieves the MW-TA loop, thus inducing less unbalancing in its 
components. 

Figure 6 shows the P&F antipattern profile, where the antipattern instances 
obviously refer to execution paths instead of single components/services. Hence, 
different symbols represents different paths where one of the components /services 
is the slowest filter. For example, MW/MWTAOrNo means that MW is the slow- 
est filter of the MW-TA-Order-Notification path. Interesting variations of this 
antipattern profile appear across scenarios, again driven by variations in the 
operational profile parameter values. 


Summary for RQı: Our approach provides insights on the performance an- 
tipattern profile of a specific design. In fact, we are able to identify considerable 
variations in the detected antipattern instances while varying the operational 
profile parameters. 


In order to answer RQ2, we have investigated the occurrence of performance 
antipatterns after applying refactoring actions that we have defined in Section 
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Fig. 8. BLOB antipattern instances across different refactorings - scenariog. 


4.2, across the operational profile space. The most interesting cases are discussed 
hereafter, and specifically: (i) Figures 7 and 8 report the BLOB refactoring effects 
on scenario, and scenariog, respectively; (ii) Figure 9 illustrates refactorings 
for the CPS antipattern in scenario 4; (iii) Figure 10 shows the P&F refactoring 
effect on scenariog. 

In Figure 7, we can notice the following effects of refactorings actions. Upon 
(R,) application, as expected, less BLOB(TA) instances appear because this 
refactoring consists of doubling the TA computation speed, while all other in- 
stances remains unvaried. (R2) introduces a further TA thread and, in this case, 
this induces less BLOB (TA) because more quickly requests are processed by 
these two threads, and realistically FA becomes the overloaded one thus induc- 
ing more BLOB(FA) instances to appear. (R3) modifies the rate of MW and 
makes it much slower, thus inducing the side effect of providing much less load 
to TA; in fact all the BLOB(TA) instances disappear, and all the other instances 
remain unvaried. (R4) decreases the rate of FA and, similarly to above, it has the 
effect of providing less load to TA, in fact the number of BLOB(TA) instances 
decreases. 

Figure 8 illustrates the effect of BLOB refactorings on scenariog. (R1) 
refactoring consists of making the TA component two times faster, hence the 
BLOB(TA) instance completely disappears from the operational space, while all 
the other antipatterns are not affected. (R2), introduces a further TA thread, but 
in this case it occurs in a quite less stressed context with respect to scenario,. 
This aspect, together with the fact that two threads allow to drop less re- 
quests, given that the queue length remains unvaried, in practice does not relieve 
TA itself. This is the reason for BLOB(TA) to not disappear. The decrease of 
BLOB(Order) instances is very likely due to the fact that, if performance in- 
dices change for some components/services, then their calculated average value 
change as well, hence inequalities in detection rules can change their results due 
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Fig. 9. CPS antipattern instances across different refactorings - scenarioa. 


to changes in the right-hand-side targets. (R3), similarly to Figure 7, modifies 
the MW rate and makes it much slower, thus having the effect of providing much 
less load to TA, in fact all BLOB(TA) instances disappear. Also (R4) behaves 
similarly to Figure 7. 

Figure 9 depicts scenarioa (i.e., the pos = 0.21 and pon = 0.78 case) when 
considering CPS antipattern instances. We recall that the detection rule for CPS 
on external services operates on response time values that do not change with the 
operational profile. This leads that CPS(FA)min occurs in the whole operational 
space (not only for the initial system, but also after R,, R2, and Rg refactorings). 
Instead, for R4 refactoring, we found CPS(FA)max always occurring, and this is 
due to nature of this refactoring that modifies the FA rate. For Rg refactoring, 
besides CPS(FA)min, we also found CPS(MW)max always occurring, and this 
is again due to the fact that Rs modifies the MW rate. 

In addition to this, we can make the following specific considerations. (R1), 
makes the TA component two times faster, hence less CPS(TA)max instances 
appear, as expected. (R2) introduces a further TA thread but it is not beneficial 
for the system, in fact the number of CPS(TA)max instances increase in the oper- 
ational profile space. This effect is again very likely due to the fact that, with two 
threads, less requests are dropped than in the one thread case. Hence the work 
on TA in practice increases. This apparent anomaly would be mitigated whether, 
in the analysis, the number of dropped requests would be considered. (R3), de- 
creases the MW rate, so it has the effect of providing less load to TA; in fact 
CPS(TA)max instances disappear, and (as mentioned above) a CPS(MW)max 
instance appears in the whole operational profile space. (R4) decreases the FA 
rate, thus having the effect of increasing the number of CPS(TA)min instances 
and decreasing the CPS(TA)max ones. 

Figure 10 illustrates scenariog (i.e., the pos = 0.98 and pon = 0.01 case) 
when considering P&F antipattern instances. Quite small variations can be ob- 
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Fig. 10. P&F antipattern instances across different refactorings - scenarioc. 


served here, as compared to other antipatterns and scenarios, always limitedly 
to single points of the operational profile space. Some specific comments follow. 
(Rı) induces less P&F instances where TA is the slowest filter and, on the same 
path, introduces more instances where Order is the slowest filter. This is an ex- 
pected behavior due to the refactoring action that makes TA faster. (R2) has no 
effect at all. (R3) modifies the rate of MW component and makes it much slower, 
thus inducing less load to TA. The effect on the P&F antipattern is minimal and 
coherent, because one more P&F(MW) instance and one less P&F(TA) instance 
occur in the same path. (R4) only introduces one more P&F(MW) on the same 
path as above, and this could be a side effect of changing the average values of 
performance indices. 


Summary for RQ2: The approach supports performance-driven refactoring de- 
cisions based on antipattern profiles, in that refactorings determine different 
effects on different regions of the operational profile space. 


4.4 Threats to validity 


Internal validity. In order to spot internal errors in our implementation for au- 
tomatically detecting multiple performance antipatterns, we have thoroughly 
tested it. We verified that the detected performance antipatterns follow the given 
rules defined in their specification, along with the expected performance indica- 
tors. Note that the detection and solution of performance antipatterns relies on 
our previous experience in this domain [17], but in the future we are interested 
to involve external users that will be enabled to add their own rules for detection 
and refactoring. 

External validity. We are aware that one case study is not enough to thor- 
oughly validate the effectiveness of our approach. Nevertheless, several experi- 
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ments have been performed beside the proposed experimental scenarios, in order 
to inspect the large number of variabilities in the operational profile space that 
may affect performance characteristics in unexpected ways. As future work, we 
would like to better investigate the effectiveness of our approach by applying it 
to further case studies (including industrial applications). 


5 Related Work 


In literature, the operational profile has been recognized as a very relevant 
factor in many domains, such as software reliability [27] and testing [30]. In 
the context of performance analysis of software systems, there are many tech- 
niques developed to act at: (i) design-time, i.e., providing model-based predic- 
tions [6,12,32]; (ii) run-time, i.e., actual measurements derived from system mon- 
itoring [10,13,35]. The refactoring, instead, is a more recent research direction, 
and many issues arise when modifying different system abstractions [3,26,5]. 
This paper contributes in demonstrating that both performance analysis and 
refactoring are affected by operational profiles, and in the following we review 
the related work aimed at pursuing this research direction. 

In [22], a method for uncertainty analysis of the operational profile is pre- 
sented, and the perturbation theory is used to evaluate how the execution rates 
of software components are affected by changes in the operational profile. Our 
approach also considers execution rates, but it is intended to support designers 
in the task of identifying performance-critical scenarios (i.e., when antipatterns 
occur and their evolution when refactoring actions are applied). In [34], perfor- 
mance antipatterns are used to isolate the problems’ root causes, and facilitating 
their solutions; the TPC-W benchmark showed a relevant increase in the max- 
imum throughput, thus to assess the usefulness of performance antipatterns. 
However, the choice of representative usage profiles is recognized by the authors 
as a limitation of the approach, since no directives are given for this scope. Our 
approach, instead, is intentionally focused on exploiting the performance an- 
tipatterns while considering the operational profile space as a first-class citizen 
of the conducted analysis. 

The static technique proposed in [25] detects and fixes performance bugs 
(i.e., break out of the loop when a given condition becomes true). It is applied to 
real-world Java and C/C++ applications, and it resulted very promising since a 
large number of new performance bugs are discovered. Like [34], this approach 
neglects the operational profile that instead may trigger the presence of further 
performance problems. As opposite, our goal is to shed the light on the impor- 
tance of the operational profile space, and our experimentation demonstrates 
that performance problems and solutions indeed vary across such a space. 

In [21], performance anomalies in testing data are detected through a new 
metric, namely the transaction profile (TP), that is inferred from the testing 
data along with the queueing network model of the testing system. The key in- 
tuition is that TP is independent from the workload, it is sensitive to variations 
caused by software updates only. Our approach also investigates what are the 
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refactorings that are more responsible of performance issues, along with the char- 
acteristics of the operational profile. In fact, refactorings produce regions of the 
operational profile space that are differently affected, and these differences can 
be used by the designers in the task of understanding the suitability of a specific 
design. The work more related to our approach is [28] where sequences of code 
refactorings (for Java-like programs) are driven by the avoidance of antipatterns 
(i.e., the BLOB only) and aimed at improving the system security. These refac- 
torings consider the attack surface (i.e., how users/attackers access to software 
functionalities) as an additional optimization objective. Our approach shares the 
intuition that antipattern-based refactorings are beneficial for software quality 
(i.e., performance in our case) and that the operational profile needs to be part 
of the evaluation, but unlike [28] we target software design abstractions, and we 
provide a global view of the antipatterns encountered by software systems across 
their entire operational profile space. A systematic literature review on software 
architecture optimization methods is provided in [1], but users’ operational pro- 
files are neglected. This further motivates our work as promoter of a research 
line that should foster more attention on the role of users and their effects on 
the available software resources. 

Summarizing, to the best of our knowledge, there is no approach that focuses 
on how the operational profile affects the performance analysis and refactoring 
of software systems, and the idea of adopting performance antipatterns for this 
scope seems to be promising according to our experimentation. 


6 Conclusion 


We presented a novel approach that considers the operational profile space of a 
system under development as a first class citizen in performance-driven analysis 
and refactoring of software systems. Performance antipatterns profiles have been 
used to support designers in the nontrivial task of identifying problematic (from 
a performance perspective) areas of the operational profile space, and refactoring 
actions are applied to improve the system performance in such areas. Experi- 
mental results confirm the usefulness of the approach, and show how it can be 
used to evaluate the suitability of a specific design in different regions of the 
operational profile space. 

In addition to the areas of future work mentioned in Section 4.4, we plan to 
extend our approach with the ability to handle reliability and costs constraints, 
and thus to support trade-off analysis among multiple quality attributes. Finally, 
the applicability of the approach could be extended by a portfolio of generic 
refactoring actions (which need to be feasible with our modelling and analysis 
techniques), and methods that automate the selection of suitable actions from 
this portfolio. 
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Abstract. Legal compliance is an important part of certifying the cor- 
rect behaviour of a business process. To be compliant, organizations 
might hard-wire regulations into processes, limiting the discretion that 
workers have when choosing what activities should be executed in a case. 
Worse, hard-wired compliant processes are difficult to change when laws 
change, and this occurs very often. This paper proposes a model-driven 
approach to process compliance and combines a) reference models from 
laws, and b) business process models. Both reference and process models 
are expressed in a declarative process language, The Dynamic Condition 
Response (DCR) graphs. They are subject to testing and verification, 
allowing law practitioners to check consistency against the intent of the 
law. Compliance checking is a combination of alignments between events 
in laws and events in a process model. In this way, a reference model 
can be used to check different process variants. Moreover, changes in 
the reference model due to law changes do not necessarily invalidate 
existing processes, allowing their reuse and adaptation. We exemplify 
the framework via the alignment of laws and business rules and a real 
contract change management process, Finally, we show how compliance 
checking for declarative processes is decidable, and provide a polynomial 
time approximation that contrasts NP complexity algorithms used in 
compliance checking for imperative business processes. All-together, this 
paper presents technical and methodological steps that are being used 
by legal practitioners in municipal governments in their efforts towards 
digitalization of work practices in the public sector. 


Keywords: Formal Models of Law, Dynamic Condition Response (DCR) 
graphs, Compliance Checking, Process Calculi, Refinement 


1 Introduction 


Ensuring that business processes comply with applicable laws and regulations 
has been a central concern with the arrival of regulatory technologies (RegTech), 
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and bring together different disciplines ranging from legal theory to computer 
science. We understand compliance as the “act/process to ensure that business 
operations, processes, and practices are in accordance with prescriptive (often 
legal) documents" [15]. Checking compliance requires ways to compare artefacts 
coming from very different domains: the legal domain and the process domain. 
On the one hand, business processes have as a main criteria the fulfilment of a 
business goal. On the other hand, processes operate within a regulated context, 
that sets certain limitations on how to achieve the goals, and defines responsi- 
bilities for actors involved. In the public sector, being non-compliant is not an 
option, as regulations determine the rights and obligations of their citizens. In 
the private sector, the risk of being non-compliant equates to possible hefty fines 
for the organization‘. 

Linking laws and processes have several challenges: First, how can we formally 
interpret ambiguous regulations written in natural language? Second, how to pair 
that formal interpretation of the law against a business process? Third, how to 
reuse legal specifications in different process domains?, and fourth, what will 
happen with compliance when the laws change? Compliance checking refers to 
the verification procedure that compares regulations and processes: In its most 
simple form, compliance checking can be expressed as the following problem: 
given a formal specification of a law L and a business process P, we say that 
the process is compliant if 1. Every action that P does is in accordance to the 
permissions allowed by L, and 2. Every execution of P meets the set of obligations 
established by L, and 3. Executions of P don’t do anything prohibited by L. In 
any other case we will say that the process is not compliant. 

In this paper we focus on the compliance checking problem from a mod- 
elling/programming language perspective. First, we explore how declarative pro- 
cess languages can describe the set of requirements expressed in legal documents. 
The challenge is both at the level of language expressiveness (can the language 
express the intended semantics of a legal text?), as well as understandability 
(can a non-expert understand the specification?). Second, we look at the process 
dimension: can we have a general framework that considers different process 
artefacts? Third, we look at the alignment between the legal and the process 
dimension: Can we provide an efficient algorithm to compute whether a process 
is compliant with the legislation? 

In [20], a taxonomy of the requirements needed to formally express laws was 
presented. Overall, a formal language that expresses legal requirements should 
be able to describe what can be done (permissions), what must be done (obli- 
gations), and what should not happen (violations). Moreover, these so-called 
deontic constraints are effectful (e.g.: an obligation might grant certain permis- 
sions, e.g. “you must pay for delivery, but when you do so, you may decide 
whether to pay now or upon delivery" and vice-versa, a permission may impose 
certain obligations, e.g. “you may park here if you pay later"). The content of 
the laws might also influence the choice of the language. Laws might describe 
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constraints related to the control flow, temporal information, data, or resource 
constraints [39]. Finally, the language of choice should be able to describe defea- 
sible conditions [18], that is, when parts of the law become irrelevant, and are 
superseded by other parts. 

Compliance checking requires a formal representation of business goals and 
processes. Such a representation traditionally takes the shape of traces (c.f.: 
event-logs) at run-time, and of imperative process models at design-time. In the 
imperative paradigm, languages such as BPMN [35] and UML Activity Dia- 
grams [34] describe processes as activities and composition operators that pre- 
scribe how the flow in the activities executed in the process. Rules and laws 
are not first-class citizens in imperative models, and they need to be encoded as 
annotations in the process language [13], or paired with additional languages, 
such as BPMN-Q [4]. In contrast, declarative process models focus in the de- 
scription of circumstantial information of processes (e.g.: the why of the pro- 
cess). Languages such as Declare [37] and Dynamic Condition Response (DCR) 
Graphs [10,22] are some exponents of these types of languages. They describe 
a process as a set of constraints between activities which can be translated to 
specific business rules or goals. Their semantics is usually characterised by either 
mapping the declarative model to a flow-based model (e.g. transition systems), 
or by introducing an operational semantics that reasons over the state of the 
different constraints and/or activities of the model. 

The objective of this paper is two-fold. First, it explores whether existing 
declarative process languages are expressive enough to formalise regulations; 
second, it introduces compliance checking via declarative processes. The DCR 
graphs process notation has been developed for the formalisation and digitalisa- 
tion of collaborative, adaptive case management processes. The visual notation is 
both supported by a range of formal techniques, and serves as the formal base for 
the industrial (www.dcrgraphs.net) modelling and simulation tool. In contrast 
to Declare, the DCR graphs technology has been succesfully employed in major 
industrial case management systems, and at the moment it supports 70% of the 
Danish Central Government institutions’. DCR graphs have been extended to 
include both data [43], time [5,24], sub-processes [10], and choreographies [25]. 
In the present paper we consider the core notation with time, which is expres- 
sive enough to represent both regular and omega-regular languages [10] as well 
as so-called true concurrency [9]. In this work we only focus on laws describ- 
ing control-flow and temporal constraints, leaving data, resource constraints or 
inter-law dependencies for future work. 

Our approach for process compliance can be summarised as follows: both the 
legal domain and the business/organisational domain are defined as independent 
DCR graphs, and compliance checking is reduced to process refinement. These 
two independent models allow for a separation of concerns on what is legal and 
what is business/organisational requirements and goals, and it eases compliance 
checking when either laws or organisational processes change. It is worth to point 
out that at its core, the choice of a process language can be replaced to any 
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existing process language (including imperative ones), as compliance checking is 
mainly defined over traces. Changes in regulations might affect existing running 
processes: the typical example is governmental case work, where processes need 
to be revised every time a new regulation is signed. In addition, organisational 
changes or process optimisation efforts might modify a business process in a 
way that stops being compliant with existing laws. Finally, the separation of the 
legal and business domains supports different stages of the compliance life cycle: 
designing new processes that are compliant with the laws (e.g.: Compliance- 
by-Design (CbD) [14]), as well as the verification of existing or mined process 
models [33] becomes possible. 

Contributions This paper presents the first compliance framework for declar- 
ative process models that 1) can represent safety and omega-regular liveness 
properties, 2) is supported by industrial design and simulation tools, and 3) 
is currently in use in the digitalization strategies of municipal governments, 
and 4) allows for a separation of concerns between what is legal and what is 
process-specific. Thanks to having the same formal language for laws and busi- 
ness processes, we can use efficient verification techniques based on process re- 
finement, This comes in contrast to approaches based in annotated imperative 
business processes, where the complexity of compliance checking belongs to the 
non-polynomial complexity class [45]. 

Document Structure Section 2 introduces the compliance framework. Sec- 
tion 3 presents DCR graphs, and illustrates its use on a case study. Section 4 
explains the construction of reference models. Section 5 describes our compliance 
checking technique. Results from validation with organizations are documented 
in Section 6. Related work is compared in 7. We conclude in Section 8. 


2 Regulatory Compliance Framework 


The overall components of our compliance framework are described in Fig. 1. 
It shows the interactions between two different type of roles: The compliance 
officer, with a background in law, identifies the applicable regulations, and for 
each law she generates a reference model. Laws might be abstract, e.g.: “Any in- 
formation relating to an identified or identifiable natural person (‘data subject’)” 
(Art. 4in GDPR [7]). Consequently, the officer might need to combine the law 
with implementation acts (e.g. the Danish Data Protection Act [8]). In this way, 
the specification must narrow down ambiguities such as: “What corresponds to 
any information”?, “in which ways will the process identify a person"? or “who 
constitutes a natural person”? While the disambiguation process is mostly a man- 
ual processes that depends on the expertise of the compliance officer, computer 
support might provide help in the elicitation phase. Dual-coding tools support 
lawyers in the generation of formal specifications [29], and NLP techniques can 
be used to speedup the identification of process-related information [30]. The 
output will be a collection of reference models, each of them describing a law. 
Each model describe roles, rights, obligations, and the relations between them. 
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Fig. 1. Compliance Framework 


Compliance checking assumes the existence of a process. This can be elicited 
from stakeholders via standard techniques |12] or, if the process already exists, 
via process mining [33]. Process models contain the activities performed, roles, 
and resource information (time & data) used. Alternatively, one can consider 
disregarding process discovery and perform compliance checking directly over 
event logs, as in classical process conformance approaches [1]. 


Both models and process models are subject to verification and validation 
phases. Scenario replays, reachability and deadlock-livelock checkers provide 
guarantees that both structural properties of the models are preserved. 


The last dimension revolves compliance, and it constitutes the core of this 
paper. Since reference models are specific to a given regulation, they need to 
be instantiated in terms of the business process. This requires the alignment 
between events identified in the reference model, and activities in the business 
process. Compliance checking is then reduced to trace refinement: all traces in 
the process model are a subset of the traces in the reference model. 


The separation between reference and compliance models allows for modu- 
lar verification. When laws and processes change, their models can be changed 
separately, only needing to revise the alignment between events and activities. 
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Fig. 2. DCR Processes Syntax. 


3 DCR Graphs 


In this section, we recall the syntax and semantics of Dynamic Condition Re- 
sponse (DCR) processes. We use the core term-based definition with time, with- 
out bound events and subprocesses, following the original presentation in [5]. 

We assume a fixed universe of events E ranged over by e, f with a special 
symbol tick ¢ E. A DCR process [M] T comprises a marking M, a term T. Its 
syntax is given in Figure 2. 

A term represents a process model consisting of events (which may be ac- 
tivities, tasks, or the identification of the state of affairs) and their relations. 
In a DCR graph, events are the nodes and relations are the arcs. A marking 
represents the current state of a process by specifying for every event the event 
state (whether the event previously happened, is currently included, and/or is 
pending). A process is then represented by the process model (a term) and its 
current state (a marking). Relations can take the following shape: 


— Condition ese f: It defines a prohibition, or a precondition for f. Before f 
can occur, e must have happened at least t time units ago, or e must have 
been excluded. In the case that t = 0, we simply write e>e f. 


— Response ees f: It defines an obligation for e. If e has happened, then f 
must occur within ¢ time units, or be excluded. In the case t = w, this will 
be treated as eventually in LTL, that is, not bounded by any time constraint. 
For such a case we can simply write ee f. 

— Dynamic Inclusion e—>+ f: It defines relevance of an event. After executing 
event e, event f is included among the possible actions to take. Notice that 
the inclusion of f does not deem its necessity (captured by a response). 

— Dynamic Exclusion e>% f: It defines irrelevance of an event. The result of 
executing e is that event f becomes excluded. Moreover, all conditions f—eg 
and milestones f—og are ignored (unless f is included again). 

— Milestone eof: A reaction chain. Initially f is included among the possible 
actions, but if e becomes pending, then f cannot occur until e has occurred. 
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Finally, term 0 denotes the null process. Note that it is possible to specify a 
relation twice, e.g., e>%f || e>%f; this duplication has no additional effect. 

All relations refer to a marking M, a finite map from events to triples of 
variables (h,i,p), referred to as the event state and indicating whether or not 
the event previously (h)appened, is currently (i)ncluded, and/or is (p)ending. 
A pending event represents an unfulfilled obligation, and the values it can take 
denote whether the event is not pending (p = f), it has a finite deadline (p € 
NU{0}), or it should be eventually executed (p = w). We write markings as finite 
lists of pairs of events and event states, e.g. e1 : ®1,...,e, : Pk but treat them 
as maps, writing dom(M) and M(e), and understand M,e : ® to be undefined 
when e € dom(M). The free events fe(T’) of a term T is simply the set of events 
appearing in it. 

With respect to the original presentation [5], our syntax extends the process 
definition with labels. Labelling À defines a total function from events to labels. 
However, we often omit the labelling function, as it rarely changes, writing [M] T 
instead of [M] AT . We assume that event labels are unique, e.g.: if e, f € fe(T) 
then A(e) Æ A(f) or e = f, therefore, À has an inverse, which we will denote by 
ATL. A substitution o = {e1,...,en/f1,..-, fn} maps each event e; and replaces 
it with fi, being 1 < i < n and e; pairwise distinct. The application of ø to a 
process term T is denoted by Tø, and it applies similarly for markings and for 
processes, being ([M] T)o = [Mo] To. We require of a process P = [M] AT 
that fe(T) C dom(M) = dom()), and so define fe(P) = dom(M). The alphabet 
alph(P) is the set of labels of its free events. 


Example 3.1. We use a contract change management process from the construc- 
tion industry as our running example. The process model in Fig. 3 has been ex- 
tracted from structured interviews with domain specialists, and then validated 
in a workshop. We will focus on the most salient aspects of the process, and 
direct to [2] for the complete specification. The process includes three significant 
roles: a subcontractor, a project manager and a trade package manager (TPM) 
-external to the organization—, collaborating via a document management sys- 
tem. The process starts when the subcontractor notices that additional work 
is required compared to an original construction contract. To be paid for the 
extra work, it is their responsibility to justify using supportive documentation 
(A1). Hence, the subcontractor submits a change management request on the 
platform (A2). Further, the TPM must notify the subcontractor that his request 
has been initiated (A5), as well as checking the request specifications against the 
initial contract requirements and the technical documentation (A4). Once the 
request is checked, the TPM can decide whether to accept the change request 
(A7), to reject the request (A8) or to ask for additional documents that sup- 
port the subcontractors’ claim (A6). If the TPM decides to reject the claim, she 
must attach reasoning for the decision and communicate it to the subcontrac- 
tor. Next, the subcontractor can evaluate the rejection (A16). If there is need 
for further documentation to support the claim, the TPM must send a request 
for additional information (A1). If the TPM agrees with the change, she must 
forward documentation describing what changes from the initial contract to the 
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Fig. 3. Contract Change Management Process Pspec 


project manager. The project manager must evaluate the request (A10). He is 
responsible for taking the final decision, whether to accept (A11) or reject (A12) 
the request. In case of rejection, the project manager must notify the subcon- 
tractor about the decision and substantiate with reasoning (A14). Besides, if the 
answer is an acceptance, the project manager is responsible for sending an up- 
dated contract form (A13). Once the new contract is received, the subcontractor 
must attach it to the old contract (A15). As part of the DMS capabilities, the 
subcontractor is allowed to cancel the change request (A3) at any point after 
submission, with the effects of deleting the application (A17). 

The diagram in Fig. 3 provides a visual representation of process Pspec de- 
scribed above. Events are denoted via boxes, and arrows describe the relations 
introduced in the previous section. Each event has a label presenting its descrip- 
tion, as well as the role of the agent(s) that can execute the event. An included 
event is represented with a solid border, with a dashed line if it is excluded. 
Included events can be executed at any time (unless they become excluded), 
and, unless preceded by a response relation, they can also be left unexecuted. 
Relations can point to events or to events “collections” (bores marked with “n”). 
As formalised in [23], such collections are referred to as “nestings” and are just 
a visual shorthand, understanding arrows to (from) nestings to represent arrows 
to (from) every event inside the nesting. 


° The process is available for simulation and execution at https: //www.dergraphs.net / 
tool/main/Graph?id=43ea382d-de1b-4278-8eff-591426244qd90 
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Fig. 4. Enabling & effects. We write “ ” for “don’t care”, i.e., either true t or false f 


We point to some of the behavioural aspects in the model. The condition 
relation between Al and A2 forbids the subcontractor to perform a submission 
without documentation. The exclusion relation to itself in Al says that such 
activity can be done once per case, and it will cease to be available until it 
is included again (via the execution of A6). The response between “Decide on 
change request" and “Take action” says that once the activities A11 or A12 have 
been performed, it is obligatory to execute the included activities in the take 
action part. Only one decision can be taken per round, as the execution of A11 
and A12 exclude each other. The chain of milestones and responses between A10 
and A15 ensures that the attached copy only corresponds to the most updated 
decision: every time a project manager executes A10, the activities inside “de- 
cide on change request" become pending. This will inhibit any action until the 
decision has been revised. Finally, the timed response between A4 and A5 says 
that notification must be done within 30 time units of the execution of A4. 


3.1 Semantics 


We first define when an event is enabled and what effects it has if executed. The 
judgement |M] T F e: (Fac, Inc, Pen), defined in Figure 4, should be read: “in 
the marking M, the term T allows the event e to happen, with the effects of 
excluding events Exc, including events Inc, and making events Pen pending.” 
The first rule says that if e is a condition for f, then f can happen only if (1) 
it is itself included, and (2) if e is included, then e happened at least k steps ago. 
The second rule says that if e is a milestone for f, then f can happen only if (1) it 
is itself included, and (2) if e is included, then e must not be pending. The third 
rule says that if f is a response to e and e is included, then e can happen with the 
effect of making f pending with a deadline of k. The fourth (respectively fifth) 
rule says that if f is included (respectively excluded) by e and e is included, 
then e can happen with the effect of including (respectively excluding) f. The 
sixth rule says that for an unconstrained process 0, an event e can happen if 
it is included. The seventh rule says that a relation allows any included event 
e to happen without effects when e is not the relation’s right-hand-side event. 
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[M] da e:6 [Event] deadline(M) >0 [Tim] 
Tr M > &{e(M)) TH M ESS tick(M) 


Fig. 5. Transition semantics. 


Finally, the last rule says that enabledness for parallel composition depends on 
its constituents (we omit symmetric rules for sake of clarity). 

Given enabling and effects of events, we define the action of respectively an 
event e and an effect 6 = (Ex, In, Pe) on a marking M pointwise by the action 
on individual event states f : (h,i,r) as follows. Assume e is enabled in the 
process |M] T with effect ô = (Ex, In, Pe). The state of e tracks that the event 
has happened now, setting its executed flag to 0. Similarly, we say that it is 
not longer pending. The effect of executing e in a marking M, written e(M), is 
inductively defined as follows: 


€ if M =e 
e(M) = 4 e(N), f: (0,i,f) if M=N,f:(_,i,_)^e=f 
e(N), f: (h,i,r) if M=N,f: (h,i,r)NeF f. 


The application of effect 6 = (Ex, In, Pe) over a marking M, denoted 6(M), is 
inductively defined as follows: 


€ if M =e 
6(M) = fions : (h, (iA fZEx) V fEln, r’) if M = N, f : (h,i,r) 
T 


included? 


Where r’ = min{d | (f, d) € Pe} if (f, d) € Pe and r’ = r otherwise. That is, the 
event only stays included (second component) if f ¢ Ex (it is not excluded) or 
f € In (it is included). The pending flag takes the minimal deadline for which 
f: d € Pe, otherwise, it keeps the flag unchanged. Note that an event can be 
both excluded and included by the effect, conceptually the exclusion happens 
first, followed by the inclusion. 

The transition semantics requires us to account for the time that has passed 
between events. The deadline function is inductively defined over markings: 


w if M =e 


deadline M):= maiman if M = M'e: (h,i, p) 


With p’ taking the value of p if i = t, otherwise p’ = w. Basically, only deadlines 
of included events are considered. The deadline function sets a lower limit for 
events to happen. Moreover, we need to update the marking by incrementing 
the time after an event has fired. The tick function is inductively defined over 
markings with such purpose: 


tick(M) = € if M=e 
= ~ | tick(M’),e: (h+1,i,max{0,p—1}) if M = M’,e: (h,i,p) 


Extending the + and — operators such that f +1 = f and f — 1 = f, and 
w—-l=w. 

Figure 5 introduces the transition semantics of processes. In rule [EVENT], 
the marking M fires an enabled event e, generating as a result a marking M’. 
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Note that transitions are non-deterministic: more than one event can be enabled 
in M. In rule [TIME], the marking M is updated in one unit, generating M”. 
Intuitively, a transition T+ M “> M’ expresses that process [M] AT fires an 
event e and modifies its marking to M’. As customary, we denote with A 
the transitive closure of “+. Moreover, we define the state space of [M] T as 
P([M|T)={{[M|T|TFM = M’}. Event transitions give rise to a labelled 
transition system Its([M] AT) = (P(M),[M] T,€', —, X, ’), where [M] T € 
P([M] T) is the initial state, E’ = EU {tick} is the set of labels, —+C P([M] T) x 
E' x P([M] T), X is an alphabet, and a labelling function \’ C E x X defined by 
A(e) = A(e) for e € E, and X (tick) = tick. 

We equip with the LTS with notions of accepting runs, incorporating similar 
notions defined for DCR Graphs [6,32] to their timed setting: 


Definition 1 (Runs, Accepting Runs). A run of |M] T is a finite or infinite 
sequence of transitions |M] T = [Mo] To > eo- . A run is accepting iff for 
every state [M;| Ti, when Mi(e) = (_,t,t) then there exists j > i s.t. either 
M;(e) = (_,f,_) or [Mj] Tj > [Mj41] Ty+1. 


In other words, an accepting run consider transitions that either execute 
pending events, or excludes them. Note that since an event e may happen more 
than once, even processes with only finitely many events may have infinite runs. 
Having defined the LTS and runs we can define the language defined by a DCR 
process to be its set of accepting runs. 


Definition 2 (Traces). A trace of a process |M] AT is a possibly infinite 
string s = (s;)ier s.t. [M] T has an accepting run [Mi] Ti + [Mi+1] Tia 
with s; = A(e;i). Finally, the process |M] T has the language lang([M] AT) = 
{s | s is a trace of |M] AT}. 


4 Compliance Rules 


Not all law paragraphs are created equal. Different articles describe definitions, 
commencement periods, amendments, and other provisions. We focus on self- 
contained procedural articles, those paragraphs that do not depend on the state 
of affairs of events described in other paragraphs. One example is GDPR Art. 
21 §1: 


(Right to Object) §1. The data subject shall have the right to object, on grounds re- 
lating to his or her particular situation, at any time to processing of personal data 
concerning him or her [...]. The controller shall no longer process the personal data 
unless the controller demonstrates compelling legitimate grounds for the processing 
which override the interests, rights and freedoms of the data subject or for the estab- 
lishment, exercise or defence of legal claims. 
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Legal Text Policy Compliance Rule 


If the subcontractor 


mimis © ee aos fe e Ceea G he e ea na 


quest, he may cancel T aeea lee a ee 

it afterwards. AA Ss Se ON E 2 2 Bs 

is, aera IHS Ai(e1) = “A2: submit a change request 
A1(e2) = “A3: cancel change request” 


project manager must ak A x A 
emaativellky ceee Te A1(e3) = “A17: delete the request 


GDPR Art. 21 §1. 


request. 
S ie a After the subcontrac- 
controller [...] must at tor submits a change RC2= [e4 : (f,t,f), es : (f,t,f)] A2T> 


request, eventually the T2 = e4e—es || ea—ees 


the eine of undertaking TPM will notify the Az2(e1) =“A2: Submit change 


the recording of per- 


h subcontractor about request 
sonal data [...] provide : “é i , 
> H the processing of re- Ag2(e5) = “A5: Notifies processing 
the data subject with N 2 is 
quest, including the to subcontractor 


at least the following personál data used: 


information [...]. 
Organization SIP. ROB = [eg : (f, t, f), e7 : (f, f, f)» es : (f, t, f)] As T: 
A change request The change request is eo ee E E eee 


Ts = e6—+e7 || ese D || e6 eaen || es+%e7 
A3 (e6) = “A2: Submit change request” 

A3 (e7) = “Finish Processing request” 

A3 (es) = “Cancel Processing” 


should take a maxi- valid for 60 working 
mum amount of time, days and afterwards it 
otherwise it becomes is closed. 

invalid. 


Fig. 7. Elicitation of Compliance Rules 


We observe dependencies between 
two events, (Bı) processing of per- Event in Legislation | Activity /event in 
š Process Model 
sonal data, and (B2) the right to ob- - 
. Bı: Process personal/A2: Submit change re- 
ject. We also observe the consequences |data quest 
of applying Bo. For the sake of clarity Bz: Right to object A3: Cancel request 
ii B3: Stop processing A17: Delete request 

we assume “no longer process personal — ae 

z m Fig. 6. Instantiation of Art 21. GDPR for 
data” as the event (B3) “stop process- 


ing”. The process for Art. 21 §1 is: a ean g 


RF, = [By : (f, t, f), Bo : (f, t, f), Bs H (f,t,f)] Bye Bo | Boe B3 

The reference model requires a mapping from abstract rights such as “right 
to object” into activities/events in the business process. Further knowledge from 
implementation guidelines is used to determine the proper mapping for concepts 
such as “data subject”, “controller” or “personal data”. Fig. 6 presents a mapping 
between events Art. 21 §1 and and events in Pspec in Fig. 3. 

The result of combining the dependencies from laws and business process 
information gives rise to compliance policies that are specific to the domain. A 
natural language policy such as “in case (the subcontractor) submits a change 
request, (the subcontractor) may cancel the change request. If (the subcontractor) 
cancels the request, (the project manager) must eventually delete the request". 
These policies are formalized in terms of DCR processes. Fig. 7 present some 
exemplary policies. We will refer as compliance rules to the resulting DCR pro- 
cesses in this stage. 

We capture event dependencies by relying on test-driven development [42, 
46], which serves as means of validation when introducing constraints in the 
model. Interestingly, test-driven development aligns with current practices when 
introducing changes in a law. Scenarios correspond to legal precedents [27]. In 
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common law, a legal precedent corresponds to a previous case that establishes 
a principle or rule. This principle is then used by judicial bodies when deciding 
later cases with similar issues or facts. Compliance rules can be tested against 
scenarios representing legal precedents, where valid rules should at least be able 
to reach the same decisions from earlier precedents. 

The last step in the elicitation of compliance rules is the alignment between 
the compliance rules and the process model. 


Definition 3 (Term Alignment & Target events). Let L,L' CL. 

A term alignment is the total function g : L + L’. If P,Q are DCR processes 
with labels L, L’ respectively, we say that g is a term alignment from P to Q if 
g is a term alignment from L to L'. Moreover, we define the target events of g 
for e in P as tg(g,e, P) = A~*(g(A(e))). 


Although term alignment is an arbitrary function defined by the compliance 
officer, we require for simplicity of the exposition that there is exactly a single 
target event for each event. 

Note that more than one g can be defined if the rules in the law applies to 
more than one set of events in the process. Also, g will typically be non-surjective 
since the business process might contain activities that do not map to any legal 
requirement. 


Definition 4 (Instances of a Compliance Rule). Let G = {g1,...,gn} be 
a set of term alignments from P to Q. An instance of P under g in Q, written 
P\,Q for g € G, is Po with labelling A' (e) = g(A(e)), such that o ={fi,..., fr/ 
€1,---,€n} where fi = tg(g,e:,P). We denote by Inst(P,G,Q) = {P\l,Q |g € 
G} the set of all instances of P under G in Q. 


Example 4.2. The term alignments g1, g2 are built from the obvious maps from 
events in RC1 and RC2 to events with same labels in Pspec in Fig. 3. Two term 
alignments are required for RC3: 


dero Label Reference Model |Event |Label Process Model 
Alignment 
Pspec 
93 A2: submit a change request | fı A2: submit a change request 
Finish Processing request f2 A15: Amend initial contract 
Cancel Processing fs A3: Delete request 
ga A2: submit a change request | fı A2: submit a change request 
Finish Processing request fa A16: Receive reason for change rejection 
Cancel Processing fs A15: Delete request 


The set of term alignments for each compliance rule is respectively Gy = 
{g1}, G2 = {g2}, and G3 = {gs,g4}. As can be seen from Def. 4, the set of 
instances substitute the events for the corresponding ones in Pspec, 80 


Inst(RC3, G3, Pspec) = 


o ae Asfi >t fo || fies fo || fiefs || fs 7 
[fi : (ft, f), fa: (Ff, f), fa : (ft, P] As fiot fa || fre® fa || Ae fs || fa% fa 


Moreover, labels have also changed, being às (f2) = “A15: Amend initial contract 
with approved change", and A3(f4) = “A16: Receive reason for change rejection”. 
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5 Compliance Checking by Refinement 


In previous sections we showed how to use DCR processes for the specification of 
declarative workflows (c.f. Section 3), and the generation of compliance rules (c.f.: 
Section 4). In this section, we will consider compliance as a particular instance 
of DCR process refinement [10], between each of the instances generated by a 
compliance rule, and the process specification. 

Abstractly, we take refinement to be just inclusion of languages (trace sets). 
Given a sequence s, write s; for the i-th element of s, and s|y for the largest 
sub-sequence s’ of s such that s; € X for 0 < i < |s|; eg, if s = AABC then 
s| a,c = AAC. We lift projection to sets of sequences point-wise. 


Definition 5 (Refinement [11]). Let P,Q be processes. We say that Q is a 
refinement of P iff lang(Q)|aiph(p) E lang(P). We will write RE P whenever R 
is a refinement of P. 


In practice, we will use a notion of refinement by composition, as introduced 
in [11] to define a "refines" relation between a process and an instance of a 
compliance rule. To define composition, we need to merge parallel markings and 
effects. Merge on markings is partial, since it is only defined on markings that 
agree on their overlap: 


(Mi,e:m) ® (M2,e : m) = (Mı E M2),e:m 
(Mı,e: m) Mz = (Mı P M2),e:m when e ¢ dom( M2) 
Mı  (M2,e : m) = (Mı Ð M2),e:m when e ¢ dom(M1). 


The merge of effects ô is defined as the pointwise union of each of the sets of 
excluded/included/pending events: (Exc, Inci, Peni) ® (Excz, Ince, Pena) = 
(Bac, U Fac, Inci U Ince, Pen, U Peng). 


Definition 6 (Merge & Marking Compatibility). The merge of processes 
[M] Ai T and [N] A2U is defined if the merge of markings M © N is defined 
and the labelling functions agree as well, in which case [M] Ai T [N] àU = 
[M @ N] (A, U à2) (T || U). If the merge of two processes is defined, we say that 
they are marking compatible. 


We can now define the refines relation between an instance P of a compliance 
rule and a marking compatible process Q (i.e.: the process model) as follows. 


Definition 7 (Refines). Let P,Q be marking compatible processes. We say that 
Q refines P iff PẸ QEP. 


Note that even though P@Q = QP, it may still be the case that PAQE P 
but not of POQZQ. 


Definition 8 (Compliance). Let P,R be DCR processes, and G be a set of 
term alignments from R to P. We say that P is strongly (resp. weakly) compliant 
with R under G, written P<GR (resp. P<GR) if VR; € Inst(R,G,P), P 
refines Ri (resp. if IR; € Inst(R,G,P), P refines Ri). 


392 H. A. López et al. 


That is, take rule R, a process P and a term alignment mapping labels in 
R to P. (Strong) compliance requires us to 1) generate all instances of R in P 
and 2) check whether the merge of each instance with the P is compatible (i.e. 
refines) the instance. Notice that while instances and the process will have their 
merge defined, P might have different constraints that might affect refinement. 

We close this section stating results regarding the decidability and tractability 
of compliance checking for DCR processes. 


Theorem 1 (Compliance checking is decidable). Let P,R be DCR pro- 
cesses, and let G be a set of term alignments from R to P. Then checking P<GR 
and P<@GR is decidable. 


Proof. We know from [11] that refinement of DCR processes is known to be 
decidable; this fact relies on the state space of a DCR process being finite. Time 
does not change this; see [24] for details. It is therefore sufficient to prove that for 
any R and G, the set Inst(R, G, P) is finite. By Definition 3, this set is bounded 
by the size of G and the number of possible substitutions ø. But G is finite by 
definition, and o is clearly uniquely determined given a g € G. 


While generally checking refinement for DCR processes is NP-hard already 
in the absence of time, [11] showed that the refines relation can be approximated 
by a static property, the non-invasiveness on the graphs recalled below. 


Definition 9 (Non-invasiveness [11]). Let P = |Mp] Ap Tp and R be mark- 
ing compatible processes. We say that P is non-invasive for R iff 


1. For every context C[—], such that Tp = Cle >% f] or Tp = Cle >+ f], 
f Z fe(R); and 
2. For every label l € alph(P)Nalph(R), ife € fe(P) is labelled 1, then e € fe( R). 


That is, a process P is non-invasive for a process R if it does not introduce 
inclusion or exclusion relations on the events of R. We note that this property 
can straightforwardly be determined in polynomial time. 


Lemma 1. Non-invasiveness is decidable in polynomial time. 


Proof. Follows from Definition 9: an algorithm only needs to check for each 
inclusion and exclusion relation in P if the target event exists in R. 


In [11] it was also shown that non-invasiveness guarantees the refine relation. 
This can be extended to timed processes. 


Theorem 2. If P is non-invasive for R then P refines R. 


Proof (sketch). We need to extend the proof in [11] to timed processes observing 
the following: 1) in the case of conflicting deadlines the most strict deadlines 
always take precedence, 2) therefore after composition of a R and P which 
share a timed relation with a different deadline, the most strict deadline will 
be followed, and 3) the composed process will not allow for traces which were 
forbidden under the strictest deadline. 
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We can apply this result to compliance, and show that a process is compliant 
with a compliance rule, if it is non-invasive for all term alignments. 


Lemma 2. Let P,R be DCR processes, and G be a set of term alignments 
from R to P, P is strongly (resp. weakly) compliant with R under G if VR; € 
Inst(R,G,P), P is non-invasive for Ri. 


Proof. Follows directly from Definition 8 and Theorem 2. 


Correspondingly, this means that compliance checking is a polynomial time 
task if P is non-invasive for R for all term alignments. 


Theorem 3. If P is non-invasive VR; € Inst(R,G,P), then checking P<GR 
and P<@GR is polynomial in R,G, P. 


Proof. Follows directly from Lemmas 1 and 2. 


We conclude that through careful construction of the process model, in partic- 
ular by avoiding the unnecessary introduction of exclusion and inclusion relations 
on events which may be governed by compliance rules, we can significantly reduce 
the time complexity of checking the compliance of the process. This comes in 
contrast to approaches based in annotated imperative business processes, which 
to a great extent belong to the non-polynomial complexity class [45]. 


Corollary 1. Pypec< Gg, RC1, Pspec < a, RC2, and Popece $ aa RC3 


6 Adoption considerations 


We describe two uses of the compliance framework: one at the municipality of 
Syddjurs (DK), and another at the municipality of Genoa (IT). The municipal- 
ities selected processes in different domains: the provision of benefits offered to 
young persons with special needs (DK), and the release of construction permits 
(IT). They were regulated by different laws, for which reference models of se- 
lected articles were created by compliance specialists. The reference models of 
articles in the Danish Consolidation for Social Services [44] and the Construc- 
tion Law of the Liguria region [40] vary on size and complexity, ranging from a 
minimum of 4 events and 12 relations, up to 86 events and 125 relations in a 
single article. The intended use of the framework varied: while Syddjurs aims at 
driving a new implementation of their processes, Genoa wanted to verify their 
current implementations with respect to the law. The work was carried out by 
case workers within the municipality (DK), and a consultancy house (IT). We 
collected feedback from users generating reference models of law about their use, 
benefits and challenges. Both organisations commented that the pairing of laws 
and models provide them traceability, and allowed lawyers to be part in the co- 
creation of process implementations using their domain knowledge. Moreover, 
law-process pairings helped them to understand the legislation, making evident 
bottlenecks in a process (an activity that for which many other events depend 
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on), and showed them previously unknown paths for achieving goal, while still 
be in accordance to the law. This aligns with previous studies on comprehension 
of hybrid artefacts combining texts and declarative models [3]. On their use, 
both organisations agreed that some laws are too general, and they required 
implementation guidelines to complete their models. A challenge concerned the 
writing style of the guidelines: if guidelines have been written in an impera- 
tive style, there is a risk of over-constraining the model. When asked about the 
understandability of the models, they reported that after an initial training, gen- 
erated models were understandable for compliance specialist, and they could be 
used as communication artefacts. However, they also reported challenges on the 
understandability of large models, and suggested the inclusion of abstractions to 
increase model comprehension. With respect to compliance, the main challenge 
concerned term alignment, as it currently needs to be hard-coded (no tool sup- 
port). In some cases, an event in the law had a 1-to-many correspondence with 
the process. Another suggestion was to extend feedback support to reasons for 
non-compliance, rather than yes/no outputs. 


7 Related Work 


We can divide related approaches into four categories: 

Model Checking techniques: Most model checking techniques for compliance 
[19] represent the process as a finite state machine and the laws in a temporal 
logic. We differ from such approaches in that we use a declarative process lan- 
guage both for defining the process and laws. The reasons are threefold: First, it 
is known that some of these languages present technical difficulties when mod- 
elling permissions, obligations and defeasible (i.e.: exceptional) conditions [16]. 
These concepts are straightforward in DCR graphs: permissions are encoded as 
enabled events, obligations are the composition of events using a response re- 
lation, and defeasible conditions are represented by mutual exclusion relations 
between events. The second advantage is the possibility of combining process 
narratives and visual notations: our work puts forward the recommendations 
from [36] that states that higher cognitive loads can be achieved when combin- 
ing process descriptions with graphical notations. This is particularly important 
in our case, as compliance specialists in local governments do not have prior 
training in using verification techniques using temporal logics. Finally, verifica- 
tion is efficient: it relies on refinement of transition systems with responses [6,28], 
and although the complexity process refinement belongs to the category of NP- 
hard problems [11], we have shown that we can use syntactic restrictions to check 
compliance in polynomial time. 

Compliance Refinement: Seaflows [31] proposes an alignment of compliance 
requirements into business processes. Laws are modelled in terms of constraints 
over event traces that can be verified at design-time and monitored at run-time. 
However, no specific constraint specification language is provided. The work 
in [41] presents a refinement-based approach where abstract business processes 
representing laws are incrementally refined until executable processes can be 
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generated. The nature of such abstract business processes is imperative, given 
in BPMN diagrams, which imposes rigidity on how to achieve certain rights. 
Compliance-by-design (CbD): FCL/PCL & Regorous [13,14,17,18,21] treats 
compliance as a property of the process to execute while not violating the laws 
in a regulation. Compliance checking requires to 1. identify the deontic effects 
of the set of modelled regulations, 2. determine the tasks and the obligations 
in force for each task, and 3. check whether the obligations have been fulfilled 
or postponed after the execution of a task. While we subscribe to CbD as a 
methodology, our approach differs in the fact that there is no need to map a 
declarative language (such as PCL and FCL) into an imperative specification. 
Visual Languages for Compliance: The work in [26] introduces eCRG, a vi- 
sual modelling notation for compliance rules including control flow, interaction, 
time, data, and resource perspectives. eCRG rules are then paired with event 
logs to determine whether completed or running process instances are compliant. 
While our approach is mostly tailored to design stages, [26] focuses on after-the- 
fact compliance. Finally, the BPMN-Q language [4] provides a visual notation 
to CTL, and the language describes compliance rules including control and data 
flow aspects, that are later model-checked against BPMN models. Declare [38] 
is LTL based and in principle, the compliance checking approach presented here 
could also be used. However, its LTL-semantics has been shown to present tech- 
nical difficulties when modelling obligations and defeasible conditions [16]. 


8 Concluding Remarks 


We presented a verification framework for the design of process models that 
are compliant with regulations. This work exploits the similarities of declarative 
process languages with logical languages to be able to express models of law. 
In this manner, both process models and models of law are described in the 
same declarative notation, and it becomes straightforward to verify whether 
compliance is achievable. We show that compliance can be checked efficiently in 
polynomial time, given careful construction of the models. 

While the focus of this paper is centred on CbD approaches, it accommo- 
dates after-the-fact compliance. In future work we will explore other variants of 
compliance, such as process conformance based on event logs. Our results rely 
on the choice of DCR as language for reference and process models, and in this 
paper we have restricted ourselves to a version of DCR graphs without subpro- 
cesses and locality. The decidability results in Thm. 1 will not hold with the 
inclusion of these operators. We have not needed to consider such constructs in 
the construction of compliance rules so far, but it would be interesting to revisit 
them in future work, as well as multi-dimensional compliance policies [39]. 
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Abstract. A blockchain is a distributed hierarchical data structure. 
Widely-used applications of blockchain include digital currencies such 
as Bitcoin and Ethereum. This paper proposes an algorithmic approach 
to analyze the efficiency of a blockchain as a function of the number of 
blocks and the average synchronization delay. The proposed algorithms 
consider a random network model that characterizes the growth of a tree 
of blocks by adhering to a standard protocol. The model is paramet- 
ric on two probability distribution functions governing block production 
and communication delay. Both distributions determine the synchroniza- 
tion efficiency of the distributed copies of the blockchain among the so- 
called workers and, therefore, are key for capturing the overall stochastic 
growth. Moreover, the algorithms consider scenarios with a fixed or an 
unbounded number of workers in the network. The main result illustrates 
how the algorithms can be used to evaluate different types of blockchain 
designs, e.g., systems in which the average time of block production can 
match the average time of message broadcasting required for synchro- 
nization. In particular, this algorithmic approach provides insight into 
efficiency criteria for identifying conditions under which increasing block 
production has a negative impact on the stability of a blockchain. The 
model and algorithms are agnostic of the blockchain’s final use, and they 
serve as a formal framework for specifying and analyzing a variety of 
non-functional properties of current and future blockchains. 


1 Introduction 


A blockchain is a distributed hierarchical data structure that cannot be modified 
(retroactively) without alteration of all subsequent blocks and the consensus of a 
majority. It was invented to serve as the public transaction ledger of Bitcoin [22]. 
Instead relying on a trusted third party, this digital currency is based on the 
concept of ‘proof-of-work’, which allows users to execute payments by signing 
transactions using hashes through a distributed time-stamping service. Resis- 
tance to modifications, decentralized consensus, and robustness for supporting 
cryptocurrency transactions, unleashes the potential of blockchain technology 
for uses in various industries, including financial services [12,26,3], distributed 
data models [5], markets [25], government systems [15,23], healthcare [13,1,18], 
IoT [16], and video games [21]. 
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Technically, a blockchain is a distributed append-only data structure com- 
prising a linear collection of blocks, shared among so-called workers, also re- 
ferred often as miners. These miners generally represent computational nodes 
responsible for working on extending the blockchain with new blocks. Since the 
blockchain is decentralized, each worker possesses a local copy of the blockchain, 
meaning that two workers can build blocks at the same time on unsynchronized 
local copies of the blockchain. In the typical peer-to-peer network implementa- 
tion of blockchain systems, workers adhere to a consensus protocol for inter-node 
communication and validation of new blocks. Specifically, workers build on top 
of the largest blockchain. If they encounter two blockchains of equal length, 
then workers select the chain whose last produced block was first observed. This 
protocol generally guarantees an effective synchronization mechanism, provided 
that the task of producing new blocks is hard to achieve in comparison to the 
time it takes for inter-node communication. The effort of producing a block rel- 
ative to that of communicating among nodes is known in the literature as ‘proof 
of work’. If several workers extend different versions of the blockchain, the con- 
sensus mechanism enables the network to eventually select only one of them, 
while the others are discarded (including the data they carry) when local copies 
are synchronized. The synchronization process persistently carries on upon the 
creation of new blocks. 


The scenario of discarding blocks massively, which can be seen as an efficiency 
issue in a blockchain implementation, is rarely present in “slow” block-producing 
blockchains. The reason is that the time it takes to produce a new block is 
long enough for workers to synchronize their local copies of the blockchain. Slow 
blockchain systems avert workers from wasting resources and time in producing 
blocks that are likely to be discarded in an upcoming synchronization. In Bitcoin, 
for example, it takes on average 10 minutes for a block to be produced and only 
12.6 seconds to communicate an update [8]. The theoretical fork-rate of Bitcoin 
in 2013 was approximately 1.78% [8]. However, as the blockchain technology 
finds new uses, it is being argued that block production needs to be faster [6,7]. 
Broadly speaking, understanding how speed-ups in block production can neg- 
atively impact blockchains, in terms of the number of blocks discarded due to 
race conditions among the workers, is important for designing new fast and yet 
efficient blockchains. 


This paper introduces a framework to formally study blockchains as a particu- 
lar class of random networks with emphasis in two key aspects: the speed of block 
production and the network synchronization delays. As such, it is parametric on 
the number of workers under consideration (possibly infinite), the probability 
distribution function that specifies the time for producing new blocks, and the 
probability distribution function that specifies the communication delay between 
any pair of randomly selected workers. The model is equipped with probabilistic 
algorithms to simulate and formally analyze blockchains concurrently produc- 
ing blocks over a network with varying communication delays. These algorithms 
focus on the analysis of the continuous process of block production in fast and 
highly distributed systems, in which inter-node communication delays are cru- 


402 C. Pinzón et al. 


cial. The framework enables the study of scenarios with fast block production, 
in which blocks tend to be discarded at a high rate. In particular, it captures the 
trade-off between speed and efficiency. Experiments are presented to understand 
how this trade-off can be analyzed for different scenarios. As fast blockchain 
systems tend to spread to novel applications, the algorithmic approach provides 
mathematical tools for specifying, simulating, and analyzing blockchain systems. 

It is important to highlight that the proposed model and algorithms are ag- 
nostic of the concrete implementation and final use of the blockchain system. 
For instance, the ‘rewards’ for mining blocks such as the ones present in the 
Bitcoin network are not part of the model and are not considered in the analy- 
sis algorithms. On the one hand, this sort of features can be seen as particular 
mechanisms of a blockchain implementation that are not explicitly required for 
the system to evolve as a blockchain. Thus, including them as part of the frame- 
work can narrow its intended aim as a general specification, design, and analysis 
tool. On the other hand, such features may be abstracted away into the proposed 
model by tuning the probability distribution functions that are parameters of 
the model, or by considering a more refined base of choices among the many 
probability distribution functions at hand for a specific analysis. Therefore, the 
proposed model and algorithms are general enough to encompass a wide variety 
of blockchain systems and their analysis. 

The contribution of this work is threefold. First, a random network model 
is introduced (in the spirit of, e.g., Barabasi-Albert [4] and Erdés-Renyi [9]) for 
specifying blockchains in terms of the speed of block production and communica- 
tion delays for synchronization among workers. Second, exact and approximation 
algorithms for the analysis of blockchain efficiency are made available. Third, 
based on the proposed model and algorithms, empirical observations about the 
tensions between production speed and synchronization delay are provided. 

The remaining sections of the paper are organized as follows. Section 2 sum- 
marizes basic notions of proof-of-work blockchains. Sections 3 and 4 introduce 
the proposed network model and algorithms. Section 5 presents experimental re- 
sults on the analysis of fast blockchains. Section 6 relates these results to existing 
research, and draws some concluding remarks and future research directions. 


2 An Overview of Proof-of-work Blockchains 


This section overviews the concept of proof-of-work distributed blockchain sys- 
tems and introduces basic definitions, which are illustrated with the help of an 
example. 

A blockchain is a distributed hierarchical data structure of blocks that cannot 
be modified (retroactively) without alteration of all subsequent blocks and the 
consensus of the network majority. The nodes in the network, called workers, 
use their computational power to generate blocks with the goal of extending the 
blockchain. The adjective ‘proof-of-work’ comes from the fact that producing a 
single block for the blockchain tends to be a computationally hard task for the 
workers, e.g., a partial hash inversion. 
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Definition 1. A block is a digital document containing: (i) a digital signature 
of the worker who produced it; (ii) an easy to verify proof-of-work witness in the 
form of a nonce; and (iti) a hash pointer to the previous block in the sequence 
(except for the first block, called the origin, that has no previous block and is 
unique). 


Technical definitions of blockchain as a data structure have been proposed 
by different authors (see, e.g., [27]). Most of them coincide on it being an im- 
mutable, transparent, and decentralized data structure shared by all workers 
in the network. For the purpose of this paper, it is important to distinguish 
between the local copy, independently owned by each worker, and the abstract 
global blockchain, shared by all workers. The latter holds the complete history 
of the blockchain. 


Definition 2. The local blockchain of a worker w is a non-empty sequence of 
blocks stored in the local memory of w. The global blockchain (or, blockchain) 
is the minimal rooted tree containing all workers’ local blockchains as branches. 


Under the assumption that the origin is unique (Definition 1), the (global) 
blockchain is well-defined for any number of workers present in the network. 
If there is at least one worker, then the blockchain is non-empty. Definition 2 
allows for local blockchains to be either synchronized or unsynchronized. The 
latter is common in systems with long communication delays or in the presence 
of anomalous situations (e.g., if a malicious group of workers is holding a fork 
intentionally). As a consequence, the global blockchain cannot simply be defined 
as a unique sequence of blocks, but rather as a distributed data structure against 
which workers are assumed to be partly synchronized to. 

Figure 1 presents an example of a blockchain with five workers, where blocks 
are represented by natural numbers. On the left, the local blockchains are de- 
picted as linked lists; on the right, the corresponding global blockchain is depicted 
as a rooted tree. Some of the blocks in the rooted tree representation in Figure 1 
are labeled with the identifier of a worker, which indicates the position of each 
worker in the global blockchain. For modeling, the rooted tree representation of 
a blockchain is preferred. On the one hand, it can reduce the amount of memory 
needed for storage and, on the other hand, it visually simplifies the inspection 
of the data structure. Furthermore, storing a global blockchain with m workers 
containing n unique blocks as a collection of lists requires in the worst-case sce- 
nario O(mn) memory (i.e., with perfect synchronization). In contrast, the rooted 
tree representation of the same blockchain with m workers and n unique blocks 
requires O(n) memory for the rooted tree (e.g., using parent pointers) and an 
O(m) map for assigning each worker its position in the tree, totaling O(n + m) 
memory. 

A blockchain tends to achieve synchronization among the workers due to the 
following reasons. First, workers follow a standard protocol in which they are 
constantly trying to produce new blocks and broadcasting their achievements to 
the entire network. In the case of cryptocurrencies, for instance, this behavior 
is motivated by paying a reward. Second, workers can easily verify (i.e., with 
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wo:0<1<5 0< 1< 5” 
w1:0<2<3<6 DA IE GER 
wz: 0< 2< 4 4™2 


w3:0<2< 3 
wa:0<2=< 3< 6 


Fig. 1: A blockchain network of five workers with their local blockchains (left) and 
the corresponding global blockchain (right); blocks are represented by natural 
numbers. Workers wo, w2, and w3 are not yet synchronized with the longest 
sequence of blocks. 


a fast algorithm) the authenticity of any block. If a malicious worker (i.e., an 
attacker) changes the information of one block, that worker is forced to repeat 
the extensive proof-of-work process for that block and all its subsequent blocks 
in the blockchain. Otherwise, its malicious modification cannot become part of 
the global blockchain. Since repeating the proof-of-work process requires that 
the attacker spends a prohibitively high amount of resources (e.g., electricity, 
time, and/or machine rental), such a situation is unlikely to occur. Third, the 
standard protocol forces any malicious worker to confront the computational 
power of the whole network, assumed to have mostly honest nodes. 

Algorithm 1 presents a definition of the above-mentioned standard protocol, 
which is followed by each worker in the network. When a worker produces a new 
block, it is appended to the block it is standing on, moves to it, and notifies the 
network about its current position and new distance to the root. Upon reception 
of a notification, a worker compares its current distance to the root with the 
incoming position. Such a worker switches to the incoming position whenever 
it represents a greater distance. To illustrate the use of the standard protocol 
with a simple example, consider the blockchains depicted in figures 1 and 2. In 
the former, either w or w4 produced block 6, but the other workers are not yet 
aware of its existence. In the latter, most of the workers are synchronized with 
the longest branch, which is typical of a slow blockchain system, and results in 
a tree with few and short branches. 


O~<1<2<«4<«5"7 < 6% -°%6 


3 


Fig. 2: Example of a typical slow system with few and short branches. 


Some final remarks on inter-node communication and implementations for 
enforcing the standard protocol are due. Note that message communication in the 
standard protocol is required to include enough information about the position of 
a worker to be located in the tree. The detail degree of this information depends, 
generally, on the design of the particular blockchain system. On the one hand, 
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Algorithm 1: Standard protocol for each worker w; in a blockchain. 


1 B; + [origin] 
2 do forever 
3 do in parallel, stop on first to occur 
4 Task 1: b < produce a subsequent block for B; 
5 Task 2: B’ + notification from another worker 
6 end 
7 if Task 1 has been completed then 
8 append b to Bi 
9 notify workers in the network about B; 
10 else if B’ is longer than B; then 
11 B; + B' 
12 endif 


sending the complete sequence from root to end as part of such a message is an 
accurate, but also expensive approach, in terms of bandwidth, computation, and 
time. On the other hand, sending only the last block as part of the message is 
modest on resources, but can represent a communication conundrum whenever 
the worker being notified about a new block x is not yet aware of the parent 
block of x. In contrast to slow systems, this situation may frequently occur in fast 
systems. The workaround is to use subsequent messages to query the previous 
blocks of x, as needed, thus extending the average duration of inter-working 
communication. 


3 A Random Network Model for Blockchains 


The network model generates a rooted tree representing a global blockchain 
from a collection of linked lists representing local blockchains (see Definition 2). 
It consists of three mechanisms, namely, growth, attachment, and broadcast. By 
growth it is meant that the number of blocks in the network increases by one 
at each time step. Attachment refers to the fact that new blocks connect to an 
existing block, while broadcast refers to the fact that the newly connected block 
is announced to the entire network. The model is parametric in a natural number 
m specifying the number of workers, and two probability distributions a and 8 
governing the growth, attachment, and broadcast mechanisms. Internally, the 
growth mechanism creates a new block to be assigned at random among the m 
workers by taking a sample from a (the time it takes to produce such a block) 
and broadcasts a synchronization message, whose reception time is sampled from 
B (the time it takes the other workers to update their local blockchains with the 
new block). 

A network at a given discrete step n is represented as a rooted tree 
Tn = (Vn, En), with nodes Vp C N and edges En C V, x Vn, and a map 
Wn : {0,1,...,m—1}— Vn. A node u € Vp represents a block u in the network 
and an edge (u,v) € En represents a directed edge from block u to its parent 
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block v. The assignment w,(w) denotes the position (i.e., the last block in the 
local blockchain) of worker w in Th. 


Definition 3. (Growth model) Let a and B be positive and non-negative prob- 
ability distributions. The algorithm used in the network model starts with Vo = 
{bo}, Eo = {} and wo(w) = bo for all workers w, being bo = 0 the root block 
(origin). At each step n > 0, Tn evolves as follows: 


Growth. A new block bn (or, simply, n) is created with production time an sam- 
pled from a. That is, Vn = Vn—1 U {n}. 

Attachment. Uniformly at random, a worker w € {0,1,...,m— 1} is chosen 
for the new block to extend its local blockchain. A new edge appears so that 
En = En_1U{(wn_i(w),n)}, and wn—ı is updated to form wn with the new 
assignment w > n, that is, wn(w) = n and wn(z) = Wr_1(z) for any z # w. 

Broadcast. Worker w broadcasts the extension of its local blockchain with the 
new block n to any other worker z with time By,, sampled from 6. 


The rooted tree generated by the model in Definition 3 begins with block 0 
(the root) and adds new blocks n = 1,2,... to some of the workers. At 
each step n > 0, a worker w is selected at random and its local blockchain, 
0 4+ +++ wn- (w), is extended to 0 +--+ 4 wn-ı(w) H n = wn (w). This re- 
sults in a concurrent random global behavior, inherent to distributed blockchain 
systems, not only because the workers are chosen randomly due to the proof- 
of-work scheme, but also because the communication delays bring some workers 
out of sync. It is important to note that the steps n = 0,1,2,... are logical time 
steps, not to be confused with the sort of time units sampled from the variables 
a and 8. More precisely, although the model does not mention explicitly the time 
advancement, it assumes implicitly that workers are synchronized at the corre- 
sponding point in the logical future. For instance, if w sends a synchronization 
message of a newly created block n to another worker z, at the end of logical 
step n and taking 6,,,, time, the message will be received by z during the logical 


step n’ > n that satisfies Saal a < Bae < 2oy Qi. 

Another two reasonable assumptions are implicitly made in the model, 
namely: (i) the computational power of all workers is similar; and (ii) any broad- 
casting message includes enough information about the new and previous blocks, 
so that no re-transmission is required to fill block gaps (or, equivalently, that 
these re-transmission times are included in the delay sampled from 8). Assump- 
tion (i) justifies why the worker producing the new block is chosen uniformly at 
random. Thus, instead of simulating the proof-of-work of the workers to know 
who will produce the next block and at what time, it is enough to select a worker 
uniformly and take a sample time from a. Assumption (ii) helps in keeping the 
model description simple. Without Assumption (ii), it would be mandatory to 
explicitly define how to proceed when a worker is severely out of date and re- 
quires several messages to get synchronized. 

In practice, the distribution a that governs the time it takes for the network, 
as a single entity, to produce a block is exponential with mean @. Since proof- 
of-work is based on finding a nonce that makes a hashing function fall into a 
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specific set of targets, the process of producing a block is statistically equivalent 
to waiting for a success in a sequence of Bernoulli trials. Such waiting times 
would correspond -at first- to a discrete geometric distribution. However, be- 
cause the time between trials is very small compared to the average time between 
successes (usually fractions of microseconds against several seconds or minutes), 
the discrete geometric distribution can be approximated by a continuous expo- 
nential distribution function. Finally, note that the choice of the distribution 
function 8 that governs the communication delay, and whose mean is denoted 
by 8, heavily depends on the system under consideration and its communication 
details (e.g., its hardware and protocol). 


4 Algorithmic Analysis of Blockchain Efficiency 


This section presents an algorithmic approach to the analysis of blockchain eff- 
ciency. The algorithms are used to estimate the proportion of valid blocks that 
are produced during a fixed number of growth steps, based on the network model 
introduced in Section 3, for blockchains with fixed and unbounded number of 
workers. In general, although presented in this section for the specific purpose of 
measuring blockchain efficiency, these algorithms can be easily adapted to com- 
pute other metrics of interest, such as the speed of growth of the longest branch, 
the relation between confirmations of a block and the probability of being valid 
in the long term, or the average length of forks. 


Definition 4. Let T, = (Vn, En) be a blockchain that satisfies Definition 3. The 
proportion of valid blocks pn in Tn is defined as the random variable: 
_ max{dist(0,u) | u € Va} 
Pn A 


The proportion of valid blocks p produced for a blockchain (in the limit) is defined 
as the random variable: 


p= lim pp. 


n— oo 


Their expected values are denoted with Pa and p, respectively. 


Note that p, and p are random variables particularly useful to determine 
some important properties of blockchains. For instance, the probability that a 
newly produced block becomes valid in the long run is p. The average rate at 
which the longest branch grows is approximated by p/a. Moreover, the rate at 
which invalid blocks are produced is approximately (1 — p)/@ and the expected 
time for a block to receive a confirmation is @/p. Although p, and p are random 
for any single simulation, their expected values p» and p can be approximated 
by averaging several Monte Carlo simulations. 

The three algorithms presented in the following subsections are sequential 
and single threaded!, designed to compute the value of pn under the standard 


1 This would be mitigated by the fact that parallelization may be available for the 
Monte-Carlo simulations. 
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protocol (Algorithm 1). They can be used for computing Pn and, thus, for ap- 
proximating p for large values of n. The first and second algorithms compute the 
exact value of pn for a bounded number of workers. While the first algorithm 
simulates the three mechanisms present in the network model (i.e., growth, at- 
tachment, and broadcast -see Definition 3), the second one takes a more time- 
efficient approach for computing pn. The third algorithm is a fast approximation 
algorithm for pn, useful in the context of an unbounded number of workers. It is 
of special interest for studying the efficiency of large and fast blockchain systems 
because its time complexity does not depend on the number of workers in the 
network. 


4.1 Network Simulation with a Priority Queue 


Algorithm 2 simulates the model with m workers running concurrently under the 
standard protocol for up to n logical steps. It uses a list B of m block sequences 
that reflect the local copy of each worker. The sequences are initially limited to 
the origin block 0 and can be randomly extended during the simulation. Each 
iteration of the main loop consists of four stages: (i) the wait for a new block to 
be produced, (ii) the reception of messages within a given waiting period, (iii) the 
addition of a block to the blockchain of a randomly selected worker, and (iv) the 
broadcasting of the new position of the selected worker in the shared blockchain 
to the other workers. The priority queue pq is used to queue messages for future 
delivery, thus simulating the communication delays. Messages have the form 
(t’,i, B’), where t represents the arrival time of the message, i is the recipient 
worker, and B’ the content that informs that a (non-specified) worker has the 
sequence of blocks B’. The statements a() and 6() draw samples from a and £, 
respectively. 

The overall complexity of Algorithm 2 depends, as usual, on specific assump- 
tions on its concrete implementation. First, let the time complexity to query 
a() and 6() be O(1), which is a reasonable assumption in most computer pro- 
gramming languages. Second, note that the following time complexity estimates 
may be higher depending on their specific implementations (e.g., if a histogram 
is used instead of a continuous function for sampling these variables). In par- 
ticular, consider two implementation variants. For both variants, the average 
length of the priority queue with arbitrarily large n is expected to be O(m), 
more precisely, m3/@. Consider a scenario in which the statement B; + B’ is 
implemented by creating a copy in O(n) time and the append statement is O(1) 
time. The overall time complexity of the algorithm is O(mn?). Now consider a 
scenario in which B; «+ B’ merely copies the list reference in O(1) time and the 
append statement creates a copy in O(n) time. For the case where n >> m, under 
the assumption that the priority queue has log-time insertion and removal, the 
time complexity is brought down to O(n”). In either case, the spatial complexity 
is O(mn). 

A key advantage of Algorithm 2 is that with a slight modification it can 
return the blockchain s instead of the proportion pn, which enables a richer 
analysis in the form of additional metrics different than p. For example, assume 


Algorithmic Analysis of Blockchain Efficiency with Communication Delay 


Algorithm 2: Simulation of m workers using a priority queue. 


1t<+0 

2 B + {[0}, [0],...,[0]] (m block sequences, 0 is the origin) 
3 pq + empty priority queue 

4 fork <1,...,.n—1do 


5 t~t+a() 

6 for (t’,i, B’) € pq with t <t do (receive notifications) 
7 pop (t',i, B’) from pq 

8 if B’ is longer than B; then B; + B’ endif 

9 end 
10 j + random _worker() (block producer) 


11 append a new block (k) to Bj 
12 for i € {0,...,m— 1} \ {j} do (publish notifications) 


13 | push (t + 8(), i, Bj) to pq 

14 end 

15 end 

16 s + argmax|s| (longest sequence) 
sEB 


17 return |s|/n 
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that J denotes the random variable that describes the quantity of invalid blocks 
that are created between consecutive blocks. The expected value E|J] can be 
estimated from p as E[I] ~ (1 — p)/a. Building a complete blockchain can be 
used to estimate not only E[J], but also a complete histogram of J and various 


properties it may possess. 


4.2 A Faster Simulation Algorithm 


Algorithm 3: Simulation of m workers using a matrix d 


1 to, ho, zo + 0,1,0 

2 do + (0,0,...,0) (m elements) 

3 for k <1,...,n—1do 

j + random _worker() 

tk < tk-ı +.a() 

hy 4 1 + max{hi|i< k A ti+dij <t,} (Algorithm 4) 
Zk max(Zk-1, hr) 


dk + (B0), +1 BQ; 0,80,» 80) 


N 


j’th position 


O% Naane 


9 end 
10 return zZn-1 
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Algorithm 3 is a faster alternative to Algorithm 2. It uses a different encoding 
for the collection of local blockchains. In particular, Algorithm 3 stores the length 
of the blockchains instead of the sequences themselves. Thereby, it suppresses 
the need for a priority queue. Algorithm 4 offers an optimized routine that can 
be called from Algorithm 3. 


Algorithm 4: Fast computation of hp given t;, zi, hi and d; for alli < k 
1 z,i + 1,k—1 

2 while i > 0 and x < z; do 

3 if t; < tk —di and hi > x then 


4 x = hi 
5 endif 

6 i4 i-l 
7 end 


8 return l+x (compute hp := 1 + max {h;i|i < k A ti + dij < tk} U{1}) 


Let tp represent the (absolute) time at which block k is created, hy the length 
of the local blockchain after being extended with block k, and z, the cumulative 
maximum given by 

zk := max {h; | i < k}. 


The spatial complexity of Algorithm 3 is O(mn) due to the computation of 
matrix d and its time complexity is O(nm + n?) when Algorithm 4 is not used. 
Note that there are n iterations, each requiring O(n) and O(m) time for com- 
puting hę and dg, respectively. However, if Algorithm 4 is used for computing 
hz, the average overall complexity is reduced. In the worst-case scenario, the 
complexity of Algorithm 4 is O(k). However, the experimental evaluations sug- 
gest an average below O(3/a@) (constant with respect to k). Thus, the average 
runtime complexity of Algorithm 3 is bounded by O (nm + min{n?,n + n/ ay), 
and this corresponds to O(nm), unless the blockchain system is extremely fast 
(6 > a). 


4.3 An Approximation Algorithm for Unbounded Number of 
Workers 


Algorithms 2 and 3 compute the value of p, for a fired number m of workers. 
Both algorithms can be used to compute pn for different values of m. However, 
the time complexity of these two algorithms heavily depends on the value of m, 
which presents a practical limitation when faced with the task of analyzing large 
blockchain systems. This section introduces an algorithm for approximating Pn 
for an unbounded number of workers. It also presents formal observations that 
support the proposed approximation. 

Recall that p, can be used as a measure of efficiency in terms of the pro- 
portion of valid blocks that have been produced up to step n in the blockchain 
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Tn = (Vn, En). Formally: 


max{dist(0, u) | u € Va} 
[Va] 


Pn = 


This definition assumes a fixed number of workers. That is, pn can be written as 
Pm,n to represent the proportion of valid blocks in T, with m workers. For the 
analysis of large blockchains, the challenge is to find an efficient way to estimate 
Pm,n for large values of m and n. In other words, to find an efficient algorithm 
for approximating the random variables pž and p* defined as: 
ph = lim Prr and p= lim p = lim pmo. 
m—> oo noo m,n—->Co 

The proposed approach modifies Algorithm 3 by suppressing the matrix d. The 
idea is to replace the need for computing di; by an approximation based on 
the random variable § and the length of the blockchain hz in each iteration 
of the main loop. Note that the first row can be assumed to be 0 wherever it 
appears because dg; = 0 for all j. For the remaining rows, an approximation is 
introduced by observing that if an element Xm is chosen at random from the 
matrix d of size (n — 1) x m (ie., matrix d without the first row), then the 
cumulative distribution function of Xm is given by 


0 ,r<0 
oo eee ee r20, 


m 


where 6() is a sample from 8. This is because the elements Xm of d are either 
samples from 8, whose domain is R>ọ, or 0 with a probability of 1/m since there 
is one zero per row. Therefore, given that the following functional limit converges 
uniformly (see Theorem 1 below), 


lim (r Ê$ P(Xm < r)) = (r 4 P(B() < r)) 


each d;i; can be approximated by directly sampling the distribution 6. As a 
result, Algorithm 4 can be used for computing hy by replacing d;,; with 8(). 


Theorem 1. Let f(r) := P(X~ < r) and g(r) := P(BQ < r). The functional 
sequence {fi}p_1 converges uniformly to g. 


Proof. Let e > 0. Define n := [+] and let k be any integer k > n. Then 


splis =sup {] 2 + (44-1) PO <n); 


IA 


II 


A 
SIwNyrlRewrle 
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Using Theorem 1, the need for the bookkeeping matrix d and the selection of 
a random worker j are discarded from Algorithm 3, resulting in Algorithm 5. The 
proposed algorithm computes py, an approximation of limm— oo Pm,n in which 
the matrix entries d; į are replaced by samples from £, each time they are needed, 
thus ignoring the arguably negligible hysteresis effects. 


Algorithm 5: Approximation for lim -,.o Pm,n simulation 

1 to, ho, zo + 0,0,0 

2 fork <1,...,n—1do 

3 tk < tk-ı + a() 

4 hk 4 1+max{hili<k At +80 <te}U {1} (Algorithm 4*) 

5 Zk — max(zk—1, hr) 

6 end 

7 return Zn_1 

Algorithm 4* stands for Algorithm 4 with G() instead of d; į (approximation) 


The time complexity of Algorithm 5 implemented by using Algorithm 4 with 
B() instead of dij is O(n”) and its space complexity is O(n). If the pruning 
algorithm is used, the time complexity drops below O(n + nG/a@)) according to 
experimentation. This complexity can be considered O(n) as long as 8 > a. 


5 Empirical Evaluation of Blockchain Efficiency 


This section presents an experimental evaluation of blockchain efficiency in 
terms of the proportion of valid blocks produced by the workers for the global 
blockchain. The model in Section 3 is used as the mathematical framework, 
while the algorithms in Section 4 are used for experimental evaluation on that 
framework. The main claim is that, under certain conditions, the efficiency of a 
blockchain can be expressed as a ratio between @ and 3. Experimental evalu- 
ations provide evidence on why Algorithm 5 -the approximation algorithm for 
computing the proportion of valid blocks in a blockchain system with an un- 
bounded number of workers- is an accurate tool for computing the measure of 
efficiency p*. 

Note that the speed of a blockchain can be characterized by the relationship 
between the expected values of a and £. 


Definition 5. Let a and B be the distributions according to Definition 3. A 
blockchain is classified as: 


— slow if ā > B, 
— chaotic if a < B, and 
— fast if a B. 
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Definition 5 captures the intuition about the behavior of a global blockchain 
in terms of how alike are the times required for producing a block and for local 
block synchronization. Note that the Bitcoin implementation is classified as a 
slow blockchain system because the time between the creation of two consecutive 
blocks is much larger than the time it takes for local blockchains to synchronize. 
In chaotic blockchains, a dwarfing synchronization time means that basically no 
(or relatively little) synchronization is possible, resulting in a blockchain in which 
rarely any block would be part of “the” valid chain of blocks. A fast blockchain, 
however, is one in which both the times for producing a block and broadcasting 
a message are similar. The two-fold goal of this section is first, to analyze the 
behavior of p* for the three classes of blockchains, and second, to understand 
how the trade-off between production speed and communication time affects the 
efficiency of the data structure by means of a formula. 

In favor of readability, the experiments presented next identify algorithms 3 
and 5 as Am and A», respectively. Furthermore, the claims and experiments 
assume that the distribution @ is exponential, which holds true for proof-of-work 
systems. 


Claim 1 Unless the system is chaotic, the hysteresis effect of the matrix entries 
dij in Am is negligible. Moreover, limm—oo Am(n) = Aco(n). 


Note that Theorem 1 implies that if the hysteresis effect of the random vari- 
ables dj; is negligible, then Algorithm 5 is a good enough approximation of 
Algorithm 3. However, it does not prove that this assertion holds in general. Ex- 
perimental evaluation suggests that this is indeed the case, as stated in Claim 1. 


0.96 {7 ] 
s x m workers - average 0.30 Z773 100 workers | 
0.95 m workers - 25% to 75% | š Ei workers 
33 ---- œ workers - average g 0.25 
E 0.94 }—* g 
3 3 0.20 
g 0.93 4+* ania 
8 x 3 (0.15 
5 2 
3 0.92 3 0.10 
‘a E] 
BSE AM a 
O.91 4 - === =e PP E Ng PMMA GAG X 0.05 
0.90 0.00 sina 5 i aoe ioe | 
0 50 100 150 200 0.88 0.89 0.90 0.91 0.92 0.93 0.94 
m algorithm output 


(a) Evolution of Am to Aco as m grows. 
Simulation runs contain at least 100 sam- 
ples per point. 


(b) High similarity between the p.d.f. of 
Aioo and Aæ. Simulation runs contain at 
least 1000 samples in total. 


Fig.3: Algorithmic simulation of n = 1000 blocks with @ = 1, 8 = 0.1, and 8 
exponential. The number of samples and the size of the blockchain n are chosen 
such that the execution time on a standard cpu lies below a few seconds. 
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Figure 3 summarizes the average output of Am and the region that contains 
half of these outputs, for several values of m. All outputs seem to approach that 
of Aæ, not only for the expected value (Figure 3.(a)), but also in terms of the 
generated p.d.f. (Figure 3.(b)). Similar results were obtained with several distri- 
bution functions for 8. In particular, the exponential, chi-squared, and gamma 
probability distribution functions were used (with k € {1,1.5,2,3,5,10}), all 
with different mean values. The resulting plots are similar to the ones depicted 
in Figure 3. 

As the quotient 8/ā& grows beyond 1, the convergence of Am becomes much 
slower and the approximation error is noticeable. An example is depicted in Fig- 
ure 4, where a blockchain system produces on average 10 blocks during the trans- 
mission of a synchronization message (i.e., the system is classified as chaotic). 
Even after considering 1000 workers, the shape of the p.d.f. is shifted consider- 
ably. The error can be due to: (i) the hysteresis effect that is ignored by As; or 
(ii) the slow rate of convergence. In any case, the output of this class of systems 
is very low, making them unstable and useless in practice. 


1 = 
0.18 k x m workers - average | A Z 1000 workers 
m workers - 25% to 75% EI œ workers 
0.16 3 0.3 
aS ---- œ workers - average g 
3 x S J | o | == 
È 0.14 x 5 
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g 0.12 krza 402 
fz ERR ERKR KEK D 
# KRXKKKKKK KKK g 
5 0.10 2 
Qo | C----------4----------+----------+---------- I 
Ç] % 0.1 
0.08 = 
0.06 
0.0 ZZren | 
0 50 100 150 200 0.085 0.090 0.095 0.100 0,105 0.110 
m algorithm output 


Fig. 4: For chaotic systems, the convergence is slow and the approximation error 
is large: with 1000 workers there is still an average output shift of around 0.005. 


An intuitive conclusion about blockchain efficiency and speed of block pro- 
duction is that slower systems tend to be more efficient than faster ones. That 
is, faster blockchain systems have a tendency to overproduce blocks that will not 
be valid. 


Claim 2 If the system is either slow or fast, then 


Q 
a+8 


— 


Figure 5 presents an experimental evaluation of the proportion of valid blocks 
in a blockchain in terms of the ratio 6/a. For the left and right plots, the 
horizontal axis represents how fast blocks are produced in comparison with how 
slow synchronization is achieved. If the system is slow, then efficiency is high 
because most newly produced blocks tend to be valid. If the system is fast, 
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however, then efficiency is balanced because the newly produced blocks are likely 
to either become valid or invalid with equal likelihood. Finally, note that for fast 
and chaotic blockchains, say for 107! < 8/a, there is still a region in which 
efficiency is arguably high. As a matter of fact, even if synchronization of local 
blockchains takes on average a tenth of the time it takes to produce a block, in 
general, the proportion of blocks that become valid is almost 90%. In practice, 
this observation can bridge the gap between the current use of blockchains as 
slow systems and the need for faster blockchains. 


—— average | 1.0) —— average 
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Fig. 5: Effect of speed on the proportion of valid blocks. 


6 Related Work and Concluding Remarks 


A comprehensive account of the vast literature on complex networks is beyond 
the scope of this work. The aim here is more modest, namely, the focus is on re- 
lated work proposing and using formal and semi-formal algorithmic approaches 
to evaluate properties of blockchain systems. There are a number of recent stud- 
ies that focus on the analysis of blockchain properties with respect to meta- 
parameters. Some of them are based on network and node simulators. Other 
studies conceptualize different metrics and models that aim to reduce the anal- 
ysis to the essential parts of the system. 

In [10], A. Gervais et al. introduce a quantitative framework to analyze the 
security and performance implications of various consensus and network param- 
eters of proof-of-work blockchains. They devise optimal adversarial strategies 
for several attack scenarios while taking into account network propagation. Ulti- 
mately, their approach can be used to compare the tradeoffs between blockchain 
performance and its security provisions. Y. Aoki et al. [2] propose SimBlock, a 
blockchain network simulator in which blocks, nodes, and the network itself can 
be instantiated by using a comprehensive collection of parameters, including the 
propagation delay between nodes. Towards a similar goal, J. Kreku et al. [19] 
show how to use the Absolut simulation tool [28] for prototyping blockchains 
in different environments and finding optimal performance, given some param- 
eters, in constrained platforms such as Raspberry Pi and Nvidia Jetson Tk1. 
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R. Zhang and B. Preneel [29] introduce a multi-metric evaluation framework to 
quantitatively analyze proof-of-work protocols. Their systemic security analysis 
in seven of the most representative and influential alternative blockchain designs 
concludes that none of them outperforms the so-called Nakamoto Consensus in 
terms of either the chain quality or attack resistance. All these efforts have in 
common that simulation-based analysis is used to understand non-functional re- 
quirements of blockchain designs such as performance and security, up to a high 
degree of confidence. However, in most of the cases the concluding results are 
tied to a specific implementation of the blockchain architecture. The model and 
algorithms presented in this work can be used to analyze each of these scenarios 
in a more abstract fashion by using appropriate parameters for simulating the 
blockchain growth and synchronization. 


An alternative approach for studying blockchains is through formal seman- 
tics. G. Rosu [24] takes a novel approach to the analysis of blockchain systems 
by focusing on the formal design, implementation, and verification of blockchain 
languages and virtual machines. His approach uses continuation-based formal se- 
mantics to later analyze reachability properties of the blockchain evolution with 
different degrees of abstraction. In this direction of research, E. Hildenbrandt et 
al. [14] present KEVM, an executable formal specification of Ethereum’s virtual 
machine that can be used for rapid prototyping, as well as a formal interpreter of 
Ethereum’s programming languages. C. Kaligotla and C. Macal [17] present an 
agent-based model of a blockchain systems in which the behavior and decisions 
made by agents are detailed. They are able to implement a generalized simu- 
lation and a measure of blockchain efficiency from an agent choice and energy 
cost perspective. Finally, J. Göbel et al. [11] use Markov models to establish 
that some attack strategies, such as selfish-mine, causes the rate of production 
of orphan blocks to increase. The research presented in this manuscript uses ran- 
dom networks to model the behavior of blockchain systems. As future work, the 
proposed model and algorithms can be specified in a rewrite-based framework 
such as rewriting logic [20], so that the rule-based approach in [24,14] and the 
agent-based approach in [17] can both be extended to the automatic analysis of 
(probabilistic) temporal properties of blockchains. Moreover, as it is usual in a 
random network approach, topological properties of blockchain systems can be 
studied with the help of the model proposed in this manuscript. 


In general, this paper differs from the above studies in the following aspects. 
The proposed analysis is not based on an explicit low-level simulation of a net- 
work or protocol; it does not explore the behavior of blockchain systems under 
the presence attackers. Instead, this work simulates the behavior of blockchain 
efficiency from a meta-level perspective and investigates the strength of the sys- 
tem with respect to shortcomings inherent in its design. Therefore, the proposed 
analysis differs from [10,2,19,29] and is rather closely related to studies which 
consider the core properties of blockchain systems prior to attacks [17,29]. The 
bounds for the meta-parameters are more conservative and less secure, compared 
to scenarios in which the presence of attackers is taken into account. Finally, with 
respect to studying blockchains through formal semantics, the proposed analysis 
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is able to consider an artificial but convenient scenario of having an infinite num- 
ber of concurrent workers. Formal semantics, as well as other related simulation 
tools, cannot currently handle such scenarios. 

This paper presented a network model for blockchains and showed how the 
proposed simulation algorithms can be used to analyze the efficiency (in terms of 
production of valid blocks) of blockchain systems. The model is parametric on: 
(i) the number of workers (or nodes); and (ii) two probability distributions gov- 
erning the time it takes to produce a new block and the time it takes the workers 
to synchronize their local copies of the blockchain. The simulation algorithms 
are probabilistic in nature and can be used to compute the expected value of 
several metrics of interest, both for a fixed and unbounded number of workers, 
via Monte Carlo simulations. It is proven, under reasonable assumptions, that 
the fast approximation algorithm for an unbounded number of workers yields ac- 
curate estimates in relation to the other two exact (but much slower) algorithms. 
Claims -supported by extensive experimentation- have been proposed, including 
a formula to measure the proportion of valid blocks produced in a blockchain in 
terms of the two probability distributions of the model. The model, algorithms, 
and experiments provide insights and useful mathematical tools for specifying, 
simulating, and analyzing the design of fast blockchain systems in the years to 
come. 

Future work on the analytic analysis of the experimental observations con- 
tributed in this work should be pursued. This includes proving the two claims 
in Section 5. First, that hysteresis effects are negligible unless the system is ex- 
tremely fast. Second, that the expected proportion of valid blocks in a blockchain 
system is given by @/(@ + 8), being & and ĝ the mean of the probability dis- 
tributions governing block production and communication times, respectively. 
Furthermore, the generalization of the claims to non-proof-of-work schemes, i.e. 
to different probability distribution functions for specifying the time it takes to 
produce a new block may also be considered. Finally, the study of different forms 
of attack on blockchain systems can be pursued with the help of the proposed 
model. 
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Abstract Functional specifications describe what program components can do: 
the sufficient conditions to invoke components’ operations. They allow us to 
reason about the use of components in a closed world setting, where compon- 
ents interact with known client code, and where the client code must establish the 
appropriate pre-conditions before calling into a component. 

Sufficient conditions are not enough to reason about the use of components in an 
open world setting, where components interact with external code, possibly of 
unknown provenance, and where components may evolve over time. In this open 
world setting, we must also consider the necessary conditions, i.e. what are the 
conditions without which an effect will not happen. 

In this paper we propose the Chainmail specification language for writing hol- 
istic specifications that focus on necessary conditions (as well as sufficient condi- 
tions). We give a formal semantics for Chainmail, and discuss several examples. 
The core of Chainmail has been mechanised in the Coq proof assistant. 


1 Introduction 


Software guards our secrets, our money, our intellectual property, our reputation [47]. 
We entrust personal and corporate information to software which works in an open 
world, where it interacts with third party software of unknown provenance, possibly 
buggy and potentially malicious. 

This means we need our software to be robust: to behave correctly even if used by 
erroneous or malicious third parties. We expect that our bank will only make payments 
from our account if instructed by us, or by somebody we have authorised, that space on 
a web page given to an advertiser will not be used to obtain access to our bank details 
[43], or that a concert hall will not book the same seat more than once. 

While language mechanisms such as constants, invariants, object capabilities [40], 
and ownership [14] make it possible to write robust programs, they cannot ensure that 
programs are robust. Ensuring robustness is difficult because it means different things 
for different systems: perhaps that critical operations should only be invoked with the 
requisite authority; perhaps that sensitive personal information should not be leaked; 
or perhaps that a resource belonging to one user should not be consumed by another. 
To ensure robustness, we need ways to specify what robustness means for a particular 
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class Safe{ 

field treasure 

field secret 

method take(scr) { 

if (secret==scr) then 

{ 
t=treasure 
treasure = null 
return t } } 


class Safe{ 
field treasure 
field secret 
method take(scr) { 
...as version 1... 


} 
method set (scr) { 
secret=scr } 


Figure 1. Two Versions of the class Safe 


program, and ways to demonstrate that the particular program adheres to its specific 
robustness requirements. 

Consider the code snippets from Fig. 1. Objects of class Safe hold a treasure 
and a secret, and only the holder of the secret can remove the treasure from the safe. 
We show the code in two versions; both have the same method t ake, and the second 
version has an additional method set. We assume a dynamically typed language (so 
that our results are applicable to both statically and dynamically typed settings)*; that 
fields are private in the sense of Java (i.e. only methods of that class may read or write 
these fields); and that addresses are unforgeable (so there is no way to guess a secret). 
A classical Hoare triple describing the behaviour of t ake would be: 


(ClassicSpec) £ 
method take (scr) 


PRE: this:Safe 
POST: scr=this.secretpre > this.treasure=null 
A 


scrAthis.secretpre —> Vs:Safe.s.treasure=s.treasurepre 


(ClassicSpec) expresses that knowledge of the secret is sufficient to remove the 
treasure, and that take cannot remove the treasure unless the secret is provided. But 
it cannot preclude that Safe — or some other class, for that matter — contains more 
methods which might make it possible to remove the treasure without knowledge of the 
secret. This is the problem with the second version of Safe: it satisfies (ClassicSpec), 
but is not robust, as it is possible to overwrite the secret of the Safe and then use 
it to remove the treasure. To express robustness requirements, we introduce holistic 
specifications, and require that: 


(HolisticSpec) = 
Vs.| s: SafeAs.treasure # null Awill(s.treasure = null) 
—> 4do.[ external(o) A (oaccesss.secret) | | 


(HolisticSpec) mandates that for any safe s whose treasure is not nul1, if some 
time in the future its treasure were to become nu11, then at least one external object 
(i.e. an object whose class is not Safe) in the current configuration has direct access 


4 We do not depend on the additional safety static typing provides, so we assume only a dynam- 
ically typed language. 
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to s’s secret. This external object need not have caused the change in s.treasure 
but it would have (transitively) passed access to the secret which ultimately did cause 
that change. Both classes in Fig. 1 satisfy (ClassicSpec), but the second version does 
not satisfy (HolisticSpec). 

In this paper we propose Chainmail, a specification language to express holistic 
specifications. The design of Chainmail was guided by the study of a sequence of 
examples from the object-capability literature and the smart contracts world: the mem- 
brane [17], the DOM [20,59], the Mint/Purse [40], the Escrow [18], the DAO [12,15] 
and ERC20 [61]. As we worked through the examples, we found a small set of language 
constructs that let us write holistic specifications across a range of different contexts. 
Chainmail extends traditional program specification languages [31,37] with features 
which talk about: 


Permission: Which objects may have access to which other objects; this is central 
since access to an object grants access to the functions it provides. 

Control: Which objects called functions on other objects; this is useful in identifying 
the causes of certain effects - eg funds can only be reduced if the owner called a 
payment function. 

Time: What holds some time in the past, the future, and what changes with time, 

Space: Which parts of the heap are considered when establishing some property, or 
when performing program execution; a concept related to, but different from, memory 
footprints and separation logics, 

Viewpoint: Which objects and which configurations are internal to our component, 
and which are external to it; a concept related to the open world setting. 


While many individual features of Chainmail can be found in other work, their 
power and novelty for specifying open systems lies in their careful combination. The 
contributions of this paper are: 


— the design of the holistic specification language Chainmail, 
— the semantics of Chainmail, 
— a Coq mechanisation of the core of Chainmail. 


The rest of the paper is organised as follows: Section 2 gives an example from 
the literature which we will use to elucidate key points of Chainmail. 3 presents the 
Chainmail specification language. Section 4 introduces the formal model underlying 
Chainmail, and then section 5 defines the semantics of Chainmail’s assertions. Sec- 
tion 6 discusses our design, 7 considers related work, and section 8 concludes. We 
relegate key points of exemplar problems and various details to appendices which are 
available at [1]. 


2 Motivating Example: The Bank 


As a motivating example, we consider a simplified banking application taken from 
the object capabilities literature [41]: Accounts belong to Banks and hold money 
(balances); with access to two Accounts of the same Bank one can transfer any 
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amount of money from one to the other. This example has the advantage that it requires 
several objects and classes. 

We will not show the code here (see appendix C), but suffice it to say that class 
Account has methods deposit (src, amt) and makeAccount (amt) (¿e.a 
method called deposit with two arguments, and a method called makeAccount 
with one argument). Similarly, Bank has method newAccount (amt). Moreover, 
deposit requires that the receiver and first argument (this and src) are Accounts 
and belong to the same bank, that the second argument (amt) is a number, and that 
src’s balance is at least amt. If this condition holds, then amt gets transferred from 
sxc to the receiver. The function makeNewAccount returns a fresh Account with 
the same bank, and transfers amt from the receiver Account to the new Account. 
Finally, the function newAccount when run by a Bank creates anew Account with 
corresponding amount of money in it. It is not difficult to give formal specifications 
of these methods in terms of pre- and post-conditions. 

However, what if the bank provided a steal method that emptied out every ac- 
count in the bank into a thief’s account? The critical problem is that a bank implement- 
ation including a steal method could meet the functional specifications of deposit, 
makeAccount, and newAccount, and still allow the clients’ money to be stolen. 

One obvious solution would be to adopt a closed-world interpretation of specifica- 
tions: we interpret functional specifications as exact in the sense that only implementa- 
tions that meet the functional specification exactly, with no extra methods or behaviour, 
are considered as suitable implementations of the functional specification. The prob- 
lem is that this solution is far too strong: it would for example rule out a bank that 
during software maintenance was given a new method count that simply counted the 
number of deposits that had taken place, or a method notify to enable the bank to 
occasionally send notifications to its customers. 

What we need is some way to permit bank implementations that send notifications 
to customers, but to forbid implementations of steal. The key here is to capture the 
(implicit) assumptions underlying the design of the banking application. We provide 
additional specifications that capture those assumptions. The following three informal 
requirements prevent methods like steal: 


1. An account’s balance can be changed only if a client calls the deposit method 
with the account as the receiver or as an argument. 

2. An account’s balance can be changed only if a client has access to that particular 
account. 

3. The Bank/Account component does not leak access to existing accounts or banks. 


Compared with the functional specification we have seen so far, these requirements 
capture necessary rather than sufficient conditions: Calling the deposit method to 
gain access to an account is necessary for any change to that account taking place. 
The function steal is inconsistent with requirement (1), as it reduces the balance 
of an Account without calling the function deposit. However, requirement (1) is 
not enough to protect our money. We need (2) to avoid an Account’s balance getting 


> Note that our very limited bank specification doesn’t even have the concept of an account 
owner. 
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modified without access to the particular Account, and (3) to ensure that such accesses 
are not leaked. 

We can express these requirements through Chainmail assertions. Rather than spe- 
cifying the behaviour of particular methods when they are called, we write assertions 
that range across the entire behaviour of the Bank/Account module: 


(1) = Va.[ a: Account A changes(a.balance) —> 
Jo.| (ocallsa.deposit(_,_) )V (ocalls_.deposit(a,_)) ] ] 


(2) = Va.VS:Set.[ a:Account A (will( changes(a.balance)) ins ) 
— do.[o€S A external(o) A (oaccessa)] ] 


(3) = Va.VWS:Set.[ a: Account A 
will( So.[ external(o) A (oaccessa)])inS) 
— do’.[o’ €s A external(o’) A (o/accessa)] | 


In the above and throughout the paper, we use an underscore (_) to indicate an existen- 
tially bound variable whose value is of no interest. 


Assertion (1) says that if an account’s balance changes (changes(a.balance)), 
then there must be some client object o that called the deposit method with a as a 
receiver or as an argument (( ocalls_.deposit(_) )). 

Assertion (2) similarly constrains any possible change to an account’s balance. If at 
some future point the balance changes (will( changes(...) )), and if this future change 
is observed with the state restricted to the objects from S (i.e. (...in S }), then at least 
one of these objects (o € S) is external to the Bank/Account system (external( o )) 
and has direct access to that account object (( o access a )). Notice that while the change 
in the balance happens some time in the future, the external object o has access to 
a in the current state. Notice also that the object which makes the call to deposit 
described in (1), and the object which has access to a in the current state described in 
(2), need not be the same: it may well be that the latter passes a reference to a to the 
former (indirectly), which then makes the call to deposit. 

It remains to think about how access to an Account may be obtained. This is the 
remit of assertion (3), which says that if at some time in the future of the state restricted 
to S, some object o which is external has access to some account a, and if a exists in 
the current state, then in the current state some object from S has access to a. Where 
o and o’ may, but need not, be the same object. And where o’ has to exist and have 
access to a in the current state, but o need not exist in the current state — it may be 
allocated later. Assertion (3) thus gives essential protection when dealing with foreign, 
untrusted code. When an Account is given out to untrusted third parties, assertion (3) 
guarantees that this Account cannot be used to obtain access to further Accounts. 


A holistic specification for the bank account, then, would be a sufficient functional 
specification plus the necessary specifications (1)-(3) from above. This holistic specific- 
ation permits an implementation of the bank that also provides count and notify 
methods, even though the specification does not mention either method. Critically, 
though, the holistic Chainmail specification does not permit an implementation that 
includes a steal method. 
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3 Chainmail Overview 


In this Section we give a brief and informal overview of some of the most salient fea- 
tures of Chainmail- a full exposition appears in Section 5. 


Example Configurations We will illustrate these features using the Bank/Account 
example from the previous section. We use the runtime configurations 0; and o2 shown 
in the left and right diagrams in Figure 2. In both diagrams the rounded boxes de- 
pict objects: green for those from the Bank/Account component, and grey for the 
“external”, “client” objects. The transparent green rectangle shows which objects are 
contained by the Bank/Account component. The object at 1 is a Bank, those at 2, 
3 and 4 are Accounts, and those at 91, 92, 93 and 94 are “client” objects which 
belong to classes different from those from the Bank/Account module. 

Each configuration represents one alternative implementation of the Bank object. 
Configuration cı may arise from execution using a module Mgai, where Account 
objects have a field myBank pointing to their Bank, and an integer field balance — 
the code can be found in appendix C Fig. 3. Configuration a2 may arise from execution 
using a module Mp 42, where Accounts have a myBank field, Bank objects have a 
ledger implemented though a sequence of Nodes, each of which has a field point- 
ing to an Account, a field balance, and a field next — the code can be found in 
appendix C Figs. 6 and 4. 


O1 02 


Figure 2. Two runtime configurations for the Bank/Account example. 


For the rest, assume variable identifiers b1, and ag—ay4, and ugı—ug4 denoting objects 
1, 2-4, and 91-94 respectively for both cı and o2. That is, for i=1 or i=2, o;(b1)=1, 
o;(a2)=2, o;(a3)=3, oi(a4)=4, o;(u91)=91, 0; (U92)=92, o;(u93)=93, and ai (U94)=94. 


Classical Assertions talk about the contents of the local variables (i.e. the topmost stack 
frame), and the fields of the various objects (i.e. the heap). For example, the assertion 
ag.myBank=a3.myBank, says that az and ag have the same bank. In fact, this asser- 
tion is satisfied in both oj and 2, written formally as 

01 FE ag.myBank = a3.myBank 


02 FE ag.myBank = a3.myBank. 
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The term x:C lassTId says that x is an object of class ClassId. For example 
...,01 FE ag.myBank: Bank. 

We support ree fields [11,31], e.g. ay.balance is a physical field in cı and a 
ghost field in oz since in MBA2 an Account does not store its balance (as can be 
seen in appendix C Fig. 6). We also support the usual logical connectives, and so, we 
can express assertions such as 

Va.[ a: Account —> a.myBank: Bank ^ a.balance > 0]. 


Permission: Access Our first holistic assertion, ( x access y ), asserts that object x has 
a direct reference to another object y: either one of x’s fields contains a reference to y, 
or the receiver of the currently executing method is x, and y is one of the arguments or 
a local variable. For example: 

701 FF (ag access by ) 
If o; were executing the method body corresponding to the call ag.deposit (a3,360), 
then we would have 

.,01 = (agaccess as), 
That is, during execution of deposit, the object at ag has access to the object at a3, 
and could, if the method body chose to, call a method on ag , or store a reference to a3 
in its own fields. Access is not symmetric, nor transitive: 

501 Æ (ag access az), 

1,02 = (agaccess” a3), 1,02 |Æ (ag access az). 


Control: Calls The assertion ( x callsm.y( zs) ) holds in configurations where a method 
on object x makes a method call y.m(zs) — that is it calls method m with object y as 
the receiver, and with arguments zs. For example, 

503 H (xcallsap.deposit( ag, 360) ). 
means that the receiver in o3 is x, and that ag.deposit (a3, 360) is the next state- 
ment to be executed. 


Space: In The space assertion ( Ain S ) establishes validity of A in a configuration 
restricted to the objects from the set S. For example, if object 94 is included in Sı but 
not in So, then we have 

01 = ((do.(o access a4 )) in S1) 

.,01 JF ((do. (o access a4 )) in S2). 

The set S in the assertion ( Ain S } is therefore not the footprint of A; it is more like 
the fuel [2] given to establish that assertion. Note that ...,o |= (AinS) does not 
imply ...,o |= A nor does it imply ....0 FE (Ain SUS’). The other direction of the 
implication does not hold either. 


Time: Next, Will, Prev, Was We support several operators from temporal logic: (next( A), 
will( A), prev( A), and was( A )) to talk about the future or the past in one or more steps. 
The assertion will( A) expresses that A will hold in one or more steps. For example, tak- 
ing o4 to be similar to 72, the next statement to be executed to be ap.deposit (a3,360), 


and Mpyg$...,04 H= ag-balance = 60, and thatMpao$...,04 H ag.balance > 
360, then 
Mpag$...,o4 H will( ag-balance = 420). 


The internal module, Mpg 42 is needed for looking up the method body of deposit. 
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Viewpoint: — External The assertion external( x) expresses that the object at x does 
not belong to the module under consideration. For example, 
Mapo$.--,02 = external( ug ), Mapa$.--,02 JE external( a2), 
Map2$..-,02 Æ external(b;.ledger ) 
The internal module, Mpg 42, is needed to judge which objects are internal or external. 


Change and Authority: We have used changes(...) in our Chainmail assertions in 
section 2, as in changes( a.balance ). Assertions that talk about change, or give con- 
ditions for change to happen are fundamental for security; the ability to cause change is 
called authority in [40]. We can encode change using the other features of Chainmail, 
namely, for any expression e: 

changes(e) = Jv.| e = v A next(7(e = v)) J. 
and similarly for assertions. 


Putting these together We now look at some composite assertions which use several 
features from above. For example, the assertion below says that if the statement to be 
executed is ag.deposit (a3, 60), then the balance of a2 will eventually change: 


Mpag’...,02 H (..callsag.deposit(a3, 60) ) —> will( changes( az-balance) ). 


Now look deeper into space assertions, (Ain S}, which allow us to characterise 
the set of objects which have authority over certain effects (here A). In particular, the 
assertion (will( A } in S) requires two things: i) that A will hold in the future, and ii) 
that all the objects which cause the effect which will make A valid are included in S. 
Knowing who has, and who has not, authority over properties or data is a fundamental 
concern of robustness [40]. Notice that the authority is a set, rather than a single object: 
quite often it takes several objects in concert to achieve an effect. 

Consider assertions (2) and (3) from the previous section. They both have the form 
“will((AinS)) —> P(S)”, where P is some property over a set. These assertions 
say that if ever in the future A becomes valid, and if the objects involved in making A 
valid are included in S, then S must satisfy P. Such assertions can be used to restrict 
whether A will become valid. If we have some execution which only involves objects 
which do not satisfy P, then we know that the execution will not ever make A valid. 


In summary, in addition to classical logical connectors and classical assertions over the 
contents of the heap and the stack, our holistic assertions draw from some concepts from 
object capabilities ((_ access _) for permission; (_calls_.(_) ) and changes(_) for 
authority) as well as temporal logic (will( A), was( A } and friends), and the relation of 
our spatial connective (( A in S }) with ownership and effect systems [60,14,13]. 

The next two sections discuss the semantics of Chainmail. Section 4 contains an 
overview of the formal model and section 5 focuses on the most important part of 
Chainmail: assertions. 


4 Overview of the Formal foundations 


We now give an overview of the formal model for Chainmail. In section 4.1 we intro- 
duce the shape of the judgments used to give semantics to Chainmail, while in section 
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4.2 we describe the most salient aspects of an underlying programming language used 
in Chainmail. 


4.1 Chainmail judgments 


Having outlined the ingredients of our holistic specification language, the next question 
to ask is: When does a module M satisfy a holistic assertion A? More formally: when 
does M — A hold? 

Our answer has to reflect the fact that we are dealing with an open world, where 
M, our module, may be linked with arbitrary untrusted code. To model the open world, 
we consider pairs of modules, M 3 M’, where M is the module whose code is supposed to 
satisfy the assertion, and M’ is another module which exercises the functionality of M. 
We call our module M the internal module, and M’ the external module, which represents 
potential attackers or adversaries. 

We can now answer the question: M |= A holds if for all further, potentially ad- 
versarial, modules M’ and in all runtime configurations ø which may be observed as 
arising from the execution of the code of M combined with that of M’, the assertion A is 
satisfied. More formally, we define: 

MEA if VM’ Vo € Arising(M3M’).[M3M’,o = A]. 
Module M represents all possible clients of M. As it is arbitrarily chosen, it reflects the 
open world nature of our specifications. 

The judgement M3 M’, o = A means that assertion A is satisfied by M3 M’ and ø. As 
in traditional specification languages [31,37], satisfaction is judged in the context of a 
runtime configuration g; but in addition, it is judged in the context of the internal and 
external modules. These are used to find abstract functions defining ghost fields as well 
as method bodies needed when judging validity of temporal assertions such as will(_). 

We distinguish between internal and external modules. This has two uses: First, 
Chainmail includes the “external( o )” assertion to require that an object belongs to the 
external module, as in the Bank Account’s assertion (2) and (3) in section 2. Second, we 
adopt a version of visible states semantics [45,25,38], treating all executions within a 
module as atomic. We only record runtime configurations which are external to module 
M, i.e. those where the executing object (i.e. the current receiver) comes from module 
M’. Execution has the form 

M3M,a~+-0' 
where we ignore all intermediate steps with receivers internal to M. In the next section 
we shall outline the underlying programming language, and define the judgment M 3 
M’,o ~ o’ and the set Arising(M 3 M'). 


4.2 An underlying programming language, Loo 


The meaning of Chainmail assertions is parametric with an underlying object-oriented 
programming language, with modules as repositories of code, classes with fields, meth- 
ods and ghostfields, objects described by classes, a way to link modules into larger ones, 


and a concept of program execution®. 


6 We believe that Chainmail can be applied to any language with these features. 
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We have developed Loo, a minimal such object-oriented language, which we out- 
line in this section. We describe the novel aspects of Loo, and summarise the more 
conventional parts, relegating full, and mostly unsurprising, definitions to Appendix A. 

Modules are central to Loo, as they are to Chainmail. As modules are repositories 
of code, we adopt the common formalisation of modules as maps from class identifiers 
to class definitions, c.f. Appendix, Def. 1. We use the terms module and component in 
an analogous manner to class and object respectively. Loo is untyped for several reas- 
ons. Many popular programming languages are untyped. The external module might be 
untyped, and so it is more general to consider everything as untyped. Finally, a solu- 
tion that works for an untyped language will also apply to a typed language, while the 
converse is not true. 

Class definitions consist of field, method and ghost field declarations, c.f. Appendix, 
Def. 2. Method bodies are sequences of statements, which can be field reads or field 
assignments, object creation, method calls, and return statements. Fields are private in 
the sense of C++: they can only be read or written by methods of the current class. This 
is enforced by the operational semantics, c.f. Fig. |. We discuss ghost fields in the next 
section. 

Runtime configurations, o, contain all the usual information about execution snap- 
shots: the heap, and a stack of frames. Each frame consists of a continuation, contn, 
describing the remaining code to be executed by the frame, and a map from variables 
to values. Values are either addresses or sets of addresses; sets are needed to deal with 
assertions which quantify over sets of objects, such as assertions (1) and (2) from sec- 
tion 2. We define one-module execution through a judgment of the form M, o ~> o’ in 
the Appendix, Fig. 1. 

We define a module linking operator o so that Mom’ is the union of the two modules, 
provided that their domains are disjoint, c.f. Appendix, Def. 8. As we said in section 4.1, 
we distinguish between the internal and external module. We consider execution from 
the view of the external module, and treat execution of methods from the internal mod- 
ule as atomic. For this, we define two-module execution based on one-module execution 
as follows: 


Definition 1. Given runtime configurations o, o', and a module-pair M3 M' we define 
execution where M is the internal, and M' is the external module as below: 


-M3M,0~0' if there existn > 2 and runtime configurations 04, ... On, 
such that 
e o=0;, and Oop =0. 
© MoM, oi œ oj41, forl<i<n-l 
e Class(this), ¢dom(M), and Class(this)o ¢ dom(M), 
e Class(this),, €dom(M), for2< i< n-2 


In the definition above, Class(x), looks up the class of the object stored at x, c.f. 
Appendix, Def.5. For example, for o4 as in Section 3 whose next statement to be ex- 
ecuted is ag.deposit (a3, 360), we would have a sequence of configurations 041, 

. J4n, O5 SO that the one-module execution gives Mp42,04 ~ 041 © 042... ~> 
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Figure 3. Two Module Execution (Def. 1). a) Mı oMz2 b) Mi 3 M2 C) M2 3 Mı 


O4n ~œ Os. This would correspond to an atomic evaluation in the two-module exe- 
cution: Mpa2$M’,o4 ~ os (see Fig.3; where blue stands for o(this) € Mj,and 
orange for o (this) € Mp). 

Two-module execution is related to visible states semantics [45] as they both filter 
configurations, with the difference that in visible states semantics execution is unfiltered 
and configurations are only filtered when it comes to the consideration of class invari- 
ants while two-module execution filters execution. The lemma below says that linking 
is associative and commutative, and preserves both one-module and two-module exe- 
cution. 


Lemma 1 (Properties of linking). For any modules M, M', M”, and M" and runtime 
configurations c, and o' we have: 


— (MoM’)oM” = Mo(M'oM”) and MoM’ = MoM. 
— Mo ~ 0’, and MoM’ is defined, implies MoM’, o ~ 0’. 
-M3M,0~0' implies (MoM’)3 (Mom), a~ o. 


We can now answer the question as to which runtime configurations are pertinent 
when judging a module’s adherence to an assertion. Initial configurations are those 
whose heap have only one object, of class Ob ject, and whose stack have one frame, 
with arbitrary continuation. Arising configurations are those that can be reached by 
two-module execution, starting from any initial configuration. 


Definition 2 (Initial and Arising Configurations). are defined as follows: 


- Initial((w,x)), if wW consists of a single frame ọ with dom(¢) = {this}, 
and there exists some address a, such that |this|g=a, and dom(x)=a, and 
x(a) = (Object, Ø). 

- Arising(MsM') = {0 | doo. [Znitial(oo) A M3M,a9~* 0 | } 


5 Assertions 


Chainmail assertions (details in appendix B.3) consist of (pure) expressions e, com- 
parisons between expressions, classical assertions about the contents of heap and stack, 
the usual logical connectives, as well as our holistic concepts. In this section we fo- 
cus on the novel, holistic, features of Chainmail (permission, control, time, space, and 
viewpoint), as well as our wish to support some form of recursion while keeping the 
logic of assertions classical. 
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5.1 Satisfaction of Assertions - Access, Control, Space, Viewpoint 


Permission expresses that an object has the potential to call methods on another object, 
and to do so directly, without help from any intermediary object. This is the case when 
the two objects are aliases, or the first object has a field pointing to the second object, or 
the first object is the receiver of the currently executing method and the second object 
is one of the arguments or a local variable. Interpretations of variables and paths, |...|,, 
are defined in the usual way (appendix Def. 5). 


Definition 3 (Permission). For any modules M, M’, variables x and y, we define 


- M3 M',o H (xaccessy) if |x]ə and |y|q are defined, and 
e |x]o=ly]o or 
e |x.f]o=|y]s, for some field £, or 
e |x|,=|this|, and |y]o=|z]o, for some variable z and z appears in 
o.contn. 


In the last disjunct, where z is a parameter or local variable, we ask that z appears in the 
code being executed (o.contn). This requirement ensures that variables which were 
introduced into the variable map in order to give meaning to existentially quantified 
assertions, are not considered. 


Control expresses which object is the process of making a function call on another 
object and with what arguments. The relevant information is stored in the continuation 
(cont) on the top frame. 


Definition 4 (Control). For any modules M, M', variables x , y, 21, ...2n, we define: 


-M38M,o — (xcallsy.m(z1,...2n)) if Ix|o, lylo lzilo -= |ZnJo are 
defined, and 
e |this|c=|x|o, and 
e o.contn=u.m(v1,..Vn)}_, for some u,V1,... Vn, and 
e |y]o=lu]o and |zi\o=|vilo, foralli. 


Thus, ( x calls y.m( z1,...2n) ) expresses the call y.m(z1, ...Zn) will be executed next, 
and that the caller is x. 


Viewpoint is about whether an object is viewed as belonging to the internal mode; this 
is determined by the class of the object. 


Definition 5 (Viewpoint). For any modules M, M’, and variablex, we define 


- M§M’,o -external(x) if |x|, is defined and Class(|x|¢)>¢ ¢ dom(M) 
—- M3M’,o -internal(x) if |x|, is defined and Class(|x|,)>¢ E dom(M) 


Space is about asserting that some property A holds in a configuration whose objects 
are restricted to those from a given set S. This way we can express that the objects from 
the set S have authority over the assertion A. In order to define validity of (Ain S) ina 
configuration o, we first define a restriction operation, 0 | g which restricts the objects 
from ø to only those from S. 
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Definition 6 (Restriction of Runtime Configurations). The restriction operator | 
applied to a runtime configuration o and a variable S is defined as follows: 


- og =(v,x’), if c=(,x), dom(x’) = [S]o, and Ya € dom(x').x(a) = x’ (a). 


For example, if we take o2 from 
Fig. 2 in Section 2, and restrict it 
with some set S4 such that |S4|,, = 
{91,1,2,3,4,11}, then the restriction 
o2} g, Will look as on the right. 


Note in the diagram above the dangling pointers at objects 1, 11, and 91 - remin- 
iscent of the separation of heaps into disjoint subheaps, as provided by the * operator 
in separation logic [53]. The difference is that in separation logic, the separation is 
provided through the assertions, where A x A’ holds in any heap which can be split into 
disjoint x and x’ where x satisfies A and x’ satisfies A’. That is, in A A’ the split of the 
heap is determined by the assertions A and A’ and there is an implicit requirement of 
disjointness, while in a} s the split is determined by S, and no disjointness is required. 

We now define the semantics of ( Ain S }. 


Definition 7 (Space). For any modules M, M’, assertions A and variable S, we define: 


-M3M,oF-(Ains) if M3M',o}s FA 


The set S in the assertion ( Ain S) is related to framing from implicit dynamic 
frames [57]: in an implicit dynamic frames assertion ace x.f x A, the frame x.f 
prescribes which locations may be used to determine validity of A. The difference is 
that frames are sets of locations (pairs of address and field), while our S-es are sets of 
addresses. More importantly, implicit dynamic frames assertions whose frames are not 
large enough are badly formed, while in our work, such assertions are allowed and may 
hold or not, e.g. Mpa2$M’,0 H ~ ((An.ag-balance = n)in S4). 


5.2 Satisfaction of Assertions - Time 


To deal with time, we are faced with four challenges: a) validity of assertions in the 
future or the past needs to be judged in the future configuration, but using the bindings 
from the current one, b) the current configuration needs to store the code being executed, 
so as to be able to calculate future configurations, c) when considering the future, we 
do not want to observe configurations which go beyond the frame currently at the top of 
the stack, d) there is no "undo" operator to deterministically enumerate all the previous 
configurations. 

Consider challenge a) in some more detail: the assertion will( x.£ = 3) is satisfied 
in the current configuration g1, if in some future configuration o2, the field f of the 
object that is pointed at by x in the current configuration (c1) has the value 3, that is, if 
\LxJo,-fJe. = 3, even if in that future configuration x denotes a different object (i.e. if 
|x|o, Æ [X]o,). To address this, we define an auxiliary concept: the operator<, where 
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01 <1 02 adapts the second configuration to the top frame’s view of the former: it returns 
a new configuration whose stack comes from oa but is augmented with the view from 
the top frame from gı and where the continuation has been consistently renamed. This 
allows us to interpret expressions in c2 but with the variables bound according to 0}; 
e.g. we can obtain that value of x in configuration o2 even if x was out of scope in 02. 


Definition 8 (Adaptation). For runtime configurations 04, 02.: 


A A 
— 01402 = ($3 : Y2, X2) if 
3 = (contng[zs2/zs'], be[zs’ 4 Go(zse2)|[zs1 > G1(zs1)]), where 
e oi = (it) 92 = ($2: %2,X2) $1=(, 81) $2=(contne, 2), and 
e zsı=dom(ßı), zs2=dom(ß2), and 
e zs’ is a set of variables with the same cardinality as zs9, and all variables in 
zs’ are fresh in By and in Bo. 


That is, in the new frame ¢2 from above, we keep the same continuation as from 
og but rename all variables with fresh names zs’, and combine the variable map (6; 
from cı with the variable map 82 from o2 while avoiding names clashes through the 
renaming [zs’ +» (9(zsq)]. The consistent renaming of the continuation allows the 
correct modelling of execution, as needed for the semantics of nested time assertions, 
as e.g. in will( x.£ = 3 A will(x.f =5)). 

Having addressed challenge a) we turn our attention to the remaining challenges: 
We address challenge b) by storing the remaining code to be executed in cnt n in each 
frame. We address challenge c) by only taking the top of the frame when considering 
future executions. Finally, we address challenge d) by considering only configurations 
which arise from initial configurations, and which lead to the current configuration. 


Definition 9 (Time Assertions). For any modules M, M’, and assertion A we define 
- M3M’,o- next(A) if 


o’.| MgM, d~o' AM3SM,a<0' E A |], 

and where ¢ is so that c=(¢-_,_). 
o’.| M3 M, ~œ~* 0’ AM3M’,o<0' EA ], 
and where ¢ is so that c=(@-_,_). 


-M3M,oFwill(A) if 


-M3M,oF- prev(A) if Vo1,02.[ Initial(o1) A M3M’,01 ~* 02 
AMM, oo ~o —> MSM’, 0<02 |; A | 
- M3 M',o -was(A) if Yoi.| Initial(o1) A M} M', 01 œ* o — 


( Io2.M3 M, o1 œ>* o2 A M3 M',o2 ~œ* 0 A M3 M,o<02 =A )] 


In general, will( ( A in S ) ) is different from ( will( A) in S ). In the former assertion, 
S must contain the objects involved in reaching the future configuration as well as 
the objects needed to then establish validity of A in that future configuration. In the 
latter assertion, S need only contain the objects needed to establish A in that future 
configuration. For example, revisit Fig. 2, and take S; to consist of objects 1, 2, 4, 93, 
and 94, and Sə to consist of objects 1, 2, 4. Assume that øs is like oj, that the next 
call in g5 is a method on ug4, whose body obtains the address of a4 (by making a call 
on 93 to which it has access), and the address of ag (to which it has access), and then 
makes the call a2.deposit (a4, 360). Assume also that a4’s balance is 380. Then 
Mpai$..,05 H (will( changes( az.balance) ) in Sj ) 
Mpai3..,05 j (will( changes( az.balance) ) in S2) 
MBA1 $9.05 H will{ (changes( az.balance) in Sz) ) 


434 S. Drossopoulou et al. 


5.3 Properties of Assertions 


We define equivalence of assertions in the usual way: assertions A and A’are equivalent 
if they are satisfied in the context of the same configurations and module pairs — i.e. 
A=A' if Vo.VM,M.[ M3 M',o = A ifand only if M3 M',o H A’ J. 


We can then prove that the usual equivalences hold, e.g. AV A’ = A V A, and 
=~(3x.A) = Vx.(7A). Our assertions are classical, e.g. A A 7A = false, and 
M3 M',o H AandM$™’,o H A — A’ implies M3 M’,o H A’. This desirable property 
comes at the loss of some expected equivalences, e.g. , in general, e = false and ~ 


are not equivalent. More in Appendix B. 


5.4 Modules satisfying assertions 


Finally, we define satisfaction of assertions by modules: a module ™ satisfies an asser- 
tion A if for all other potential modules M’, in all configurations arising from executions 
of M$ M’, the assertion A holds. 


Definition 10. For any module M, and assertion A, we define: 


-MEA if VM'.VoeArising(M3M’).MsSM,o EA 


6 Examplar Driven Design 


Examplars The design of Chainmail was guided by the study of a sequence of exem- 
plars taken from the object-capability literature and the smart contracts world: 


1. Bank [49] - Bank and Account as in Section 2 with two different implementations. 
2. ERC20 [61] - Ethereum-based token contract. 

3. DAO [12,15] - Ethereum contract for Decentralised Autonomous Organisation. 

4. DOM [20,59] - Restricting access to browser Domain Object Model 


We present these exemplars as appendices [1]. Our design was also driven by work on 
other examples such as the membrane [17], the Mint/Purse [40], and Escrow [18,24]. 


Model We have constructed a Coq model’ [23] of the core of the Chainmail specifica- 
tion language, along with the underlying £,, language. Our formalism is organised as 
follows: 


1. The Loo Language: a class based, object oriented language with mutable references. 

2. Chainmail: The full assertion syntax and semantics defined in Definitions 1, 2, 3, 
4,5, 6, 7, 8, 9 and 10. 

3. Loo Properties: Secondary properties of the loo language that aid in reasoning about 
its semantics. 

4. Chainmail Properties: The core properties defined on the semantics of Chainmail. 


7 A current model can be found at: https://github.com/sophialC/HolisticSpecifications 
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In the associated appendix (see Appendix G) we list and present the properties of 
Chainmail we have formalised in Coq. We have proven that Chainmail obeys much of 
the properties of classical logic. While we formalise most of the underlying semantics, 
we make several assumptions in our Coq formalism: (i) the law of the excluded middle, 
a property that is well known to be unprovable in constructive logics, and (ii) the equal- 
ity of variable maps and heaps down to renaming. Coq formalisms often require fairly 
verbose definitions and proofs of properties involving variable substitution and renam- 
ing, and assuming equality down to renaming saves much effort. 

More details of the formal foundations of Chainmail, and the model, are also in 
appendices [1]. 


7 Related Work 


Behavioural Specification Languages Hatcliff et al. [26] provide an excellent survey 
of contemporary specification approaches. With a lineage back to Hoare logic [28], 
Meyer’s Design by Contract [38] was the first popular attempt to bring verification 
techniques to object-oriented programs as a “whole cloth” language design in Eiffel. 
Several more recent specification languages are now making their way into practical 
and educational use, including JML [31], Speci [4], Dafny [32] and Whiley [51]. Our 
approach builds upon these fundamentals, particularly Leino & Shulte’s formulation 
of two-state invariants [33], and Summers and Drossopoulou’s Considerate Reason- 
ing [58]. In general, these approaches assume a closed system, where modules can be 
trusted to codperate. In this paper we aim to work in an open system where modules’ 
invariants must be protected irrespective of the behaviour of the rest of the system. 


Defensive Consistency In an open world, we cannot rely on the kindness of strangers: 
rather we have to ensure our code is correct regardless of whether it interacts with 
friends or foes. Attackers “only have to be lucky once” while secure systems “have 
to be lucky always” [5]. Miller [39,40] defines the necessary approach as defensive 
consistency: “An object is defensively consistent when it can defend its own invariants 
and provide correct service to its well behaved clients, despite arbitrary or malicious 
misbehaviour by its other clients.” Defensively consistent modules are particularly hard 
to design, to write, to understand, and to verify: but they make it much easier to make 
guarantees about systems composed of multiple components [46]. 


Object Capabilities and Sandboxes. Capabilities as a means to support the develop- 
ment of concurrent and distributed system were developed in the 60’s by Dennis and 
Van Horn [19], and were adapted to the programming languages setting in the 70’s [44]. 
Object capabilities were first introduced [40] in the early 2000s, and many recent stud- 
ies manage to verify safety or correctness of object capability programs. Google’s Caja 
[42] applies sandboxes, proxies, and wrappers to limit components’ access to ambient 
authority. Sandboxing has been validated formally: Maffeis et al. [35] develop a model 
of JavaScript, demonstrate that it obeys two principles of object capability systems and 
show how untrusted applications can be prevented from interfering with the rest of the 
system. Recent programming languages [27,10,54] including Newspeak [9], Dart [8], 
Grace [7,30] and Wyvern [36] have adopted the object capability model. 
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Verification of Object Capability Programs Murray made the first attempt to formalise 
defensive consistency and correctness [46]. Murray’s model was rooted in counterfac- 
tual causation [34]: an object is defensively consistent when the addition of untrust- 
worthy clients cannot cause well-behaved clients to be given incorrect service. Murray 
formalised defensive consistency very abstractly, over models of (concurrent) object- 
capability systems in the process algebra CSP [29], without a specification language 
for describing effects, such as what it means for an object to provide incorrect service. 
Both Miller and Murray’s definitions are intensional, describing what it means for an 
object to be defensively consistent. 


Drossopoulou and Noble [21,48] have analysed Miller’s Mint and Purse example 
[40] and discussed the six capability policies as proposed in [40]. In [22], they sketched 
a specification language, used it to specify the six policies from [40], showed that sev- 
eral possible interpretations were possible, and uncovered the need for another four fur- 
ther policies. They also sketched how a trust-sensitive example (the escrow exchange) 
could be verified in an open world [24]. Their work does not support the concepts of 
control, time, or space, as in Chainmail, but it offers a primitive expressing trust. 


Devriese et al. [20] have deployed powerful theoretical techniques to address sim- 
ilar problems: They show how step-indexing, Kripke worlds, and representing objects 
as state machines with public and private transitions can be used to reason about object 
capabilities. Devriese have demonstrated solutions to a range of exemplar problems, in- 
cluding the DOM wrapper (replicated in our section F) and a mashup application. Their 
distinction between public and private transitions is similar to the distinction between 
internal and external objects. 


More recently, Swasey et al. [59] designed OCPL, a logic for object capability pat- 
terns, that supports specifications and proofs for object-oriented systems in an open 
world. They draw on verification techniques for security and information flow: separ- 
ating internal implementations (“high values” which must not be exposed to attacking 
code) from interface objects (“low values” which may be exposed). OCPL supports de- 
fensive consistency (they use the term “robust safety” from the security community [6]) 
via a proof system that ensures low values can never leak high values to external at- 
tackers. This means that low values can be exposed to external code, and the behaviour 
of the system is described by considering attacks only on low values. They use that lo- 
gic to prove a number of object-capability patterns, including sealer/unsealer pairs, the 
caretaker, and a general membrane. 


Schaefer et al. [55] have recently added support for information-flow security us- 
ing refinement to ensure correctness (in this case confidentiality) by construction. By 
enforcing encapsulation, all these approaches share similarity with techniques such as 
ownership types [14,50], which also protect internal implementation objects from ac- 
cesses that cross encapsulation boundaries. Banerjee and Naumann demonstrated that 
by ensuring confinement, ownership systems can enforce representation independence 
(a property close to “robust safety”) some time ago [3]. 


Chainmail differs from Swasey, Schaefer’s, and Devriese’s work in a number of 
ways: They are primarily concerned with mechanisms that ensure encapsulation (aka 
confinement) while we abstract away from any mechanism via the external( ) predic- 
ate. They use powerful mathematical techniques which the users need to understand in 
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order to write their specifications, while Chainmail users only need to understand first 
order logic and the holistic operators presented in this paper. Finally, none of these sys- 
tems offer the kinds of holistic assertions addressing control flow, change, or temporal 
operations that are at the core of Chainmail’s approach. 

Scilla [56] is a minimalistic typed functional language for writing smart contracts 
that compiles to the Ethereum bytecode. Scilla’s semantic model is restricted, assuming 
actor based communication and restricting recursion, thus facilitating static analysis of 
Scilla contracts and ensuring termination. Scilla is able to demonstrate that a number of 
popular Ethereum contracts avoid type errors, out-of-gas resource failures, and preser- 
vation of virtual currency. Scilla’s semantics are defined formally, but have not yet been 
represented in a mechanised model. 

Finally, the recent VerX tool is able to verify a range of specifications for solidity 
contracts automatically [52]. Similar to Chainmail, VerX has a specification language 
based on temporal logic. VerX offers three temporal operators (always, once, prev) but 
only within a past modality, while Chainmail has two temporal operators, both existen- 
tial, but with both past and future modalities. VerX specifications can also include pre- 
dicates that model the current invocation on a contract (similar to Chainmail’s “calls”), 
can access variables, and compute sums (only) over collections. Chainmail is strictly 
more expressive as a specification language, including quantification over objects and 
sets (so can compute arbitrary reductions on collections) and of course specifications 
for permission (“access”), space (“in”) and viewpoint (“external”) which have no ana- 
logues in VerX. Unlike Chainmail, VerX includes a practical tool that has been used 
to verify a hundred properties across case studies of twelve Solidity contracts. 


8 Conclusions 


In this paper we have motivated the need for holistic specifications, presented the spe- 
cification language Chainmail for writing such specifications, and outlined the formal 
foundations of the language. To focus on the key attributes of a holistic specification 
language, we have kept Chainmail simple, only requiring an understanding of first 
order logic. We believe that the holistic features (permission, control, time, space and 
viewpoint) are intuitive concepts when reasoning informally, and were pleased to have 
been able to provide their formal semantics in what we argue is a simple manner. 
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Abstract. The automated generation of graph models has become an 
enabler in several testing scenarios, including the testing of modeling 
environments used in the design of critical systems, or the synthesis of 
test contexts for autonomous vehicles. Those approaches rely on the au- 
tomated construction of consistent graph models, where each model sat- 
isfies complex structural properties of the target domain captured in 
first-order logic predicates. In this paper, we propose a transformation 
technique to map such graph generation tasks to a problem consisting of 
first-order logic formulae, which can be solved by state-of-the-art TPTP- 
compliant theorem provers, producing valid graph models as outputs. 
We conducted performance measurements over all 73 theorem provers 
available in the TPTP library, and compared our approach with other 
solver-based approaches like Alloy and VIATRA Solver. 


Keywords: Domain-Specific Modeling Languages - Model Generation - 
Theorem Provers 


1 Introduction 


Motivation. Synthetic graph models have been in use for many challenges of 
software engineering including the testing of object-oriented programs [18, 20], 
quality assurance of domain-specific languages [28], validation of model transfor- 
mations [7] or performance benchmarks of model repositories [5]. In particular, 
various lines of research in model-driven engineering rely upon such graph mod- 
els. Network science also heavily depends on the availability of graph models 
with designated distribution of nodes and edges. 

Active research in automated graph model generation [10,25,30,31] has been 
focusing on deriving graphs with desirable properties like consistency, diversity, 
scalability or realistic nature [37]. A particularly challenging task of domain- 
specific model generators is to ensure consistency, i.e. to guarantee that synthetic 
models are not only compliant with the metamodel of the domain, but they 
also satisfy additional well-formedness constraints captured in popular high-level 
languages like OCL or graph patterns. 


© The Author(s) 2020 
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Problem statement. Consistent graph generators frequently rely on back- 
end solvers by mapping model generation problems into logic formulae with dif- 
ferent levels of expressiveness. For example, SAT-solvers are used by Kodkod [34] 
that map high-level languages to propositional logic, CSP-solvers are exploited 
in EMF2CSP [10], while SMT-solvers were applied in [12, 15, 28]. Consistent 
model generators may rely on custom search-based techniques [31], symbolic 
techniques [25] or custom decision procedures [9,30] to improve scalability. 


Automated theorem proving techniques have been developed within the au- 
tomated reasoning community for decades with a wide range of supporting tools 
such as HOL [11] and Vampire [19]. In particular, first-order theorem provers 
have an extensive tool competition where each participating tool takes logic 
problems using a unified representation of first-order logic (FOL) formulae. This 
suggests that, despite not being designed for model generation, theorem provers 
may provide interesting results within the domain considering the success of 
other general-purpose approaches. 


Interestingly, while theorem provers have been used in model-driven engi- 
neering to prove the consistency specifications (e.g. HOL-OCL [6], Maude, KeY), 
their performance has not been investigated in depth for model generation pur- 
poses. Since FOL theorem provers already have to face undecidability issues, they 
are typically optimized to quickly find inconsistencies in formal specifications, 
while generating a model as a proof of consistency may be less of a priority. As 
such, existing mappings to FOL formulae may not be reusable in their entirety 
when theorem provers are used for consistent model generation. 


Objectives. In this paper, we aim to systematically investigate and evalu- 
ate the use of first-order logic theorem provers for model generation purposes. 
In particular, we present a mapping of domain specifications consisting of a 
metamodel, well-formedness constraints and an optional initial seed model to 
FOL formulae. Using the standard Thousands of Problems for Theorem Provers 
(TPTP) format for representing FOL formulae, we used 73 different theorem 
provers and solvers in a total of 87 different configurations to generate instance 
models of various size in the context of an industrial domain-specific modeling 
tool (Yakindu Statecharts) for a scalability evaluation of those solvers. Finally, 
model results can be transformed to instance models of the domain that can be 
opened in their native editor - although implementing this step turned out to be 
solver-specific. 


Added value. While various back-end solvers have been used in related 
mappings, the integration and inclusion of an entire family of first-order logic 
theorem provers is a novel practical result. Furthermore, our paper provides the 
first evaluation of a wide range of theorem provers for model generation purposes. 
As an important technical side effect, thanks to a novel use of constants as 
object identifiers incorporated in the mapping to FOL formulae, we managed to 
significantly improve the scalability of the Z3 SMT-solver for model generation 
purposes compared to existing approaches [28,32], which relied upon the native 
support of decision procedures in SMT-solvers. 
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2 Preliminaries 


The core concepts of domain-specific languages (DSL) and tools are illustrated in 
the context of Yakindu Statecharts [39], which is an industrial DSL for developing 
reactive, event-driven systems, and supports validation and code generation. 


2.1 Models and metamodels 


In this paper we use EMF as a metamodeling technique which is widely used in 
the modeling community. Formally [28], an (EMF) metamodel defines a vocabu- 
lary X = {Cy,...,Cn,Ri,---,Rm;,C1,---,Co}, where a unary predicate symbol C; 
is defined for each EClass and EDataType (like EInteger or EEnums), a binary 
predicate symbol R; is derived for each EReference and EAttribute, and constant 
symbols cą for EEnum literals. 


Example 1. A simplified metamodel for Yakindu Statecharts is illustrated in 
Figure 1. A Statechart consists of Regions, which contain states (Vertex) and 
Transitions. The abstract state Vertex is further refined into RegularStates 
(like State or FinalState) and PseudoStates (like Entry, Exit or Choice). 
Entry states have a Type attribute of type EntryType. 


Additionally, a metamodel also imposes several structural constraints: 

1. Type Hierarchy (TH) expresses the correct combination of classes (e.g. if an 
object is an Entry then it must be a Vertex, but it cannot be a Region); 

2. Type Compliance (TC) requires that for any relation R(o, t), its source and 
target objects o and t must have compliant types (e.g. the target of a refer- 
ence target must be an instance of Vertex); 

3. Abstract (ABS): If a class is defined as abstract, it is not allowed to have 
direct instances (like CompositeElement); 

4. Multiplicity (MUL) of structural features can be limited with upper and 
lower bound in the form of “lower..upper” (e.g. 1..1 for reference target); 

5. Inverse (INV) states that two parallel references of opposite direction always 
occur in pairs (e.g. outgoingTransitions and source). 

6. Containment (CON): Instance models in EMF are expected to be arranged 
into a containment hierarchy, which is a directed tree along relations marked 
in the metamodel as containment (e.g., vertices or outgoingTransitions). 


444 A. A. Babikian et al. 


aot 
3 : target eS 
scl: Statechart st-State $ sc1: Statechart | | r2: Region k&—4 s2: State 
regions vertices : t1: Transition regions sR vertices, or? vertices 
ri: Region - et: Entry, outgoingTransitions 1: Regi £S 9 3: Regi 
vertices [Type = Normal d: Region si: State —> r3: Region 
(a) Valid Yakindu instance model (b) Invalid, cyclic Yakindu instance model 


Fig. 2: Sample Yakindu Statechart instance models 


An instance model can be represented as a logic structure M = (Om, Zm), where 
Om is the finite set of objects, and Zm provides interpretation for all predicate 
symbols in X as follows: 

— The interpretation of a unary predicate symbol C; is defined in accordance 
with the types of the EMF model: Zm (C;) : Om > {1,0}. An object o € Om 
is an instance of (more precisely, conforms to) a class C; in a model M if 
Tm (Ci)(0) = 1. It is possible for an object to conform to multiple types, e.g. 
in case of inheritance or abstract classes. 

— The interpretation of a binary predicate symbol R; is defined in accordance 
with the links in the EMF model: Zm (R4) : Om x Om > {1,0}. There is a 
reference Rj between 01,02 € Om in model M if Zm(R;)(01,02) = 1. 

— The interpretation assigns each constant symbol c: Tm : ck > Om. 


Example 2. Figure 2a illustrates an instance model M with objects Om = 
{sc1,r1,s1,t1,el}. Classes of the object are added as labels (e.g. label scl: 
Statechart denotes Zm (Statechart)(sc1) = 1), attribute values are illustrated 
as attribute=value labels (e.g. Type = Normal as Zm(Type)(e1l, Normal) = 1), 
and reference predicates as labelled edges (e.g. regions edge from scl to r1 as 
Ty (regions)(scl,r1) = 1). 


2.2 Model predicates and Well-formedness constraints 


In many industrial modeling tools, domain-specific WF constraints are defined by 
error predicates captured either as OCL constraints [24] or as graph patterns [35]. 
A major practical subclass of such constraints can be formalized using first-order 
logic predicates [28]. 

A graph predicate y is defined inductively over a vocabulary X of a 
metamodel and an infinite set of (object) variables {v1,v2,...} and the con- 
stant symbols as seen in Figure 3a. A graph predicate y with free variables 
param = {v1,...,Un} can be evaluated over a model M with variable binding 
Z : param > Om (denoted with [y(v1,..., Un)Iy) using the rules of Figure 3b. 

Therefore, if a domain defines error patterns y1,..., Qn, a model is consid- 
ered consistent (valid), if it does not satisfy any error predicates y;(v1,...,Um) 
(1 < i < n), ie. Vor,...,0m : 7yi(v1,...,Um). Since a formalization of these 
structural restrictions as WF constraints is provided in [28], the predicate lan- 
guage of Figure 3b can uniformly be used for both kinds of structural constraints. 


Example 3. Figure 4 illustrates three graph patterns defined in both graph- 
ical and textual syntax. Pattern transition(t,src,trg) defines a relation 
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Logic Syntax TPTP Syntax 
y := c c constant 
| C(v) C(v) type predicate 
| R(v1, v2) R(v1,v2) reference predicate 
| V1 = V2 vi=v2 equivalence 
dist(v1,..., Un vil=v2 &...&vn-1!=vn|n-ary inequality (distinctness 
y y 
| ap | p1 A p2 | p1 V v2] ~p| pl&p2|ptlp2 |logic connectives 
| du:p|Vu:g ?[v]:p | ![v]:p quantified expression 
(a) Syntax of graph predicates 
M 
c =TIm(ce M . M M 
Ilg = Tale) [v1 A pal = min (oly, beal 2!) 
[colz = Zu (C)(Z(v)) M. Ar hr 
M [yi V gaz = mar(lyı]z ; lp2]z ) 
Rw v) = Zm QR) (Zv), Zw2)) Tay: oJ“ = 4 
M [Av : elz = MatocOm {Pilz vmo] 
[v1 valz Z(v1) Z v2) [vu i pA := minoco HCAP } 
Ly]? = Ae [yl]? : a =~ oc M Z,vrvo 
(b) Semantic rules for graph predicates 
Fig. 3: Syntax and semantics for graph predicates 
Gransidiontuercstre) pattern transition(t,src,trg) { 
oan trg Transition.source(t,src); 
ees tareet Transition.target(t,trg); } 
transition(t, src, trg) = source (t, src) A target (t, trg) 
incomingToEntry(t,e:Entry) @Constraint 
target | e:Entry pattern incomingToEntry(t, e:Entry) { 
ooo find transition(t,_,e); } 
ite(t,e) = ds : transition(t, s,e) A Entry (e) 
noOutgoingTransitionFromEntry(e) @Constraint 
Transition e:Entry pattern noOutgoing(e:Entry) { 


neg find transition(_,e,_); } 


target 


NEG 


no(e) = Vt, trg : atransition(t, e, trg) A Entry (e) 


Fig. 4: Example graph patterns defined with graphical and VIATRA syntax 


between two Vertices which are connected via a Transition using source 
and target references. Reusing this pattern, two WF constraints are defined 
concerning Entry states: if any of them has a match, then the model is mal- 
formed. First, incomingToEntry(t, e) selects invalid Transitions that are 
leading to an Entry (by reusing the previously defined transition pattern). 
Next, noOutgoingTransitionFromEntry(e) matches to Entry states that does 
not have any outgoing Transition (by negatively using transition pattern). 


2.3 First-Order Logic Theorem Provers 


Our approach to model generation involves using a back-end FOL theorem prover 
to generate finite models according to input constraints. The theorem prover is 
treated as a black-box component in our model generation workflow, thus it takes 
input formulae and generates an output formula. Logic formulae are given using 
the TPTP Syntax [33] as it is a standard within the theorem prover community. 
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Fig. 5: Overview of our model generation approach 


The TPTP syntax defines multiple forms of logic formulae, such as Full First- 
order Form (FOF) and Typed Higher-order Form (THF). Our mapping derives 
FOF formulae defined by a subsyntax that can handle standard FOL statements. 
This is sufficient for modeling most aspects of EMF and WF constraints. Omitted 
aspects include containment cycle avoidance and numeric attributes 

Regarding the output of TPTP-compliant theorem provers, there does not 
seem to be a standard. Provers may output FOF formulae, other TPTP formu- 
lae, or TPTP non-compliant formulae. This is not surprising, as many TPTP- 
compliant solvers also handle various other syntaxes. As a result, in order to in- 
terpret the output of TPTP-compliant provers, one must create a custom parser 
for each prover, which is laborious. However, despite syntactic differences, prover 
outputs are structurally similar: in most cases, the output contains a list of graph 
nodes, where each node is associated to corresponding types and graph edges. 


3 Overview of the Approach 


Our approach (summarized in Figure 5) aims to generate graph models that 
are consistent with respect to WF constraints of a domain-specific modeling 
environment using theorem provers as back-end solvers. For this purpose, we 
map the high-level specifications of the input DSL into equivalent FOL formulae 
written in TPTP-compliant syntax [33]. We implement our approach as part of 
the VIATRA Model Generation Framework [1]. 

The specification of the DSL (or modeling environment) consists of a meta- 
model specified in EMF augmented with well-formedness constraints captured 
by model queries (using the VIATRA framework [36]). Additionally, our gener- 
ator can take an optional initial instance model that acts as a seed for model 
generation. Our model generation framework can also take various search pa- 
rameters such as type scope (requested size) and containment cycle avoidance 
specifications as input to guide model generation towards desired characteristics. 

The input modeling environment and the search parameters are mapped to 
FOL formulae using the novel ME2TPTP model-to-text transformation detailed 
in section 4. The FOL-formula is then fed into a TPTP-compliant theorem prover 
(TPTP Solver). The solver may output a valid model if all input constraints 
are satisfiable. In this case, the output is transformed into a domain-compliant 
instance model through a TPTP2ME backwards mapping. Otherwise, if input 
constraints are inconsistent, the solver can either identify its inconsistency, or 
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provides an undefined output (if it cannot decide by its decision procedures or 
due to lack of computational resources). 

Our approach is designed to generate a finite model rather than a finite 
counterexample of the input specifications. Such a task is facilitated by including 
size requirements for the desired model a priori. However, if size requirements 
are not provided, the theorem prover could easily check for inconsistencies in the 
input formulae due to the small-model theorem [14]. 

In addition to generating graph models from scratch, our approach is also 
capable of completing initial seed models. An initial model may be inconsistent 
(i.e. it may violate some metamodel or WF constraints), thus it is the task of 
the TPTP solver to extend the input model into a consistent instance model. 
Another use case is to validate the consistency of DSLs and modeling environ- 
ments [16,28]. Our approach is capable of detecting when constraints derived 
from a modeling environment are contradictory with each other. In this case, 
our approach can prove the unsatisfiability of the input constraints. 


4 From Domain-Specific Languages to First-Order Logic 


We discuss how the various components of a modeling language are mapped into 
a set y = YM A yo! nyt of TPTP-compliant FOL formulae. The formula 
y™™ is derived from the metamodel types (in section 4.1) and relations (in 
section 4.2) , as defined in section 2.1 , along with additional constraints and 
search parameters. y/“ describes the mapping for initial instance models (in 
section 4.3). Finally, pW® describes how additional WF constraints defined as 
VIATRA queries are mapped into FOL formulae (in section 4.4). All components 
of our mapping with the exception of lower multiplicities and WF constraints 
output Essentially Propositional Logic formulae. Proof systems for such formulae 
[23] do exist, but cannot be fully exploited on the output of our mapping. 


4.1 Mapping Types in the Metamodel 


The various types in the input EMF metamodel are mapped to FOL formulae 
as described below. 

Objects: A key idea in our mapping is that we use FOL constants (instead of 
other data types such as TPTP distinct objects) to represent the generated graph 
nodes. Constants are preferred due to their compatibility with our presented 
encoding (distinct objects cannot be used as arguments for FOL predicates). 

These constants are separated into two categories: first, nodes defined prior 
to theorem proving are denoted with a set of constant symbols Obj? — 
{old,..., oldn}. This set includes known objects such as enum literals and el- 
ements of the initial partial model. Additionally, the logic solver will add new 
objects to the generated model, some of which are denoted with constant sym- 
bols Obj = {new1,...,newm}. We also introduce a unary predicate object(o) 
that selects all nodes of the graph model (including attribute values, enum liter- 
als and objects). The object(o) predicate holds for all constants o in Obj? and 
for some in Obj 
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Fig. 6: Mapping type hierarchy 


Type Hierarchy (TH): To handle complex generalization relations (e.g 
multiple inheritance) in the type hierarchy, we introduce formulae to control the 
potential combinations of the type predicates. For this purpose, we map each 
EClass of the input metamodel to a FOL predicate C;(0). A sample mapping is 
shown in Figure 6 for an extract of the domain metamodel. 

To express the mutual exclusiveness of (non-abstract) classes in the type 
hierarchy, we construct a formula do = Ve;es„, ti(Ci) in disjunctive normal form 
(DNF) for the set Sna of all non-abstract classes in the metamodel. For each 
non-abstract type C;, a conjunction t;(C;) is created for all class predicates such 
that a predicate C,; is positive if and only if it is a member of set s(C;) containing 
C; and its superclasses, formally t;(Ci) = Ac es(c:) C70) A Nc, ¢s(c;) “C3 (0). We 
must ensure that any constant satisfying the object(o) predicate also satisfies the 
type hierarchy described in dp. Thus, we generate the following FOL formula: 
pil = Vo: object(o) = do. This is a filtered-types approach to type hierarchy 
transformations used in the context of Object-Relation Mapping [17]. 

We also generate a formula to handle the negative case for the object pred- 
icate. We specify that any constants o that is not compliant with the object(o) 
predicate must not be an instance of any class in the metamodel. Formally, 
the negation of object(o) implies a conjunction tno of the negations of all class 
predicates C; in the metamodel (MM): tno = Neen —=C;. The generated FOL 
formula is as follows: p44, = Vo : object(0) = tno. 

Enumerations and Literals (EN) Mapping for enumerations is carried 
out similarly to that of types. A unary predicate is created for each enum class 
E;(o) in the input metamodel, and a distinct unary predicate 1,(0) is created for 
each literal of the enum class. The mapping of an enum class creates a disjunction 
dı = Vi t,(1;). For each literal 1;, a conjunction t; is created, where only the 
predicate corresponding to 1; is positive and all others are negative, formally 
ti(1;) = l;(0) ^ MA; -1,(0). To ensure that generated enum instances are part 
of the output model and that each literal is unique, a FOL constraint is generated 
for each enum class stating that objects satisfy the corresponding predicate E; 
if and only if they also satisfy the object(o) predicate and the disjunction dı: 


yeni = Wo: E;(0) & object(o) A Vy 1i(0) A VAN =1; (o0) 
li 


lj#l; 
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EN1: Vo : EntryType(o) = (object(o)A © EntryType 
((Normal(o) A -History (o)) V (=Normal(o) A History(o)))) 
EN2-N: Vo : (o = e01 = Normal (o)) a Morrel 
EN2-H: Vo : (o = e02 & History (o)) = History 


Fig. 7: Mapping enumerations 


Each enum literal is also transformed into an individual FOL constraint that 
instantiates a constant eo; to define an enum object for each 1; that is associated 
with E;. The generated FOL constraint ensures that the output model contains a 
constant eo; corresponding to each enum literal: p% = Vo : (o = eo; + 1;(0)). 
Example 4. To better understand this mapping, we consider the EntryType 
enum in Figure 1. We omit the DeepHistory literal for the sake of conciseness. 
This enum is mapped into the 3 FOL statements shown in Figure 7. 


Model Scope: Our mapping also allows for users to specify a scope (size) for 
the generated model as a search parameter. A scope may contain an upper bound 
u and a lower bound l for the number of generated objects in the output model. 
For an upper bound specification u, we define Obj = {new1,..., NEW y—|0vj0 |} 
where Obj? is the set of nodes defined prior to theorem proving. If u — |Obj?| is 
negative then the problem is surely inconsistent. We then generate a FOL expres- 
sion which specifies that any constant o satisfying object(o) must be contained 
in either Obj? or Obj, to ensure that the theorem prover does not generate 
any further constants (that satisfy object(o)) as part of the output finite model. 


MioB = Vo : object(o) > Vy (o = oldi) V V (o = newi) 


old; € Obj? new, €ObjN 


For a lower bound specification l, we define m’ = l — |Obj? | and we create 
a set Obj, € Obj™ containing m’ constants that are also in Obj”. In the case 
where Obj N is not defined (an upper bound value has not been specified), we 
define Obj = {new1,..., new_jobjoj}- We then generate a FOL formula to 
specify that any object o that is either in Obj? or in Obj must also satisfy 
object(o) to ensure that these constants are part of the output finite model: 


puite = Vo: VV (o = oldi) V V (o = new;) | = object(o) 


old; € Obj? new; €OvjN 


Example 5. To generate a model that contains from 4 to 6 objects, 2 of which are 
already defined (e.g. enum literals), the following FOL statements are derived: 
MUB: Vo : object(o) = ((o = old) V (o = oldz) V (0 = new1)V 
(o = news) V (o = new.) V (o = new2)) 
MLB: Vo: ((o = oldı) V (o = old2) V (o = new1) V (o = new2)) = object(o) 
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EMF C— 


TC: Vv,t : (outgoingTransitions(v,t) > (Vertex(v) A Transition(t)) 
Vu, t : (source(t, v) => (Transition(t) A Vertex(v)) 


FOL MUL: Vt: Transition(t) > dv : source (t, v) 
MUU: Vt, v1, V2 : (source (t, v1) A source (t, v2)) > v1 = v2 
INV: Vu, t : source(t, v) & outgoingTransitions(v, t) 
CO1: Vv, t : contains(v, t)  outgoingTransitions(v, t) 


Fig. 8: Mapping relations 


Type Scope: A scope may be specified for each particular type C. In the 
case of an upper bound us, we define a set Obj, such that u, = |Obj®,|. If a 
model upper bound has been defined, then Obj, C Obj™ holds, and we specify 
that any constant o satisfying object(o) and C;(0) must be contained in Obj): 
yrup = Vo: (object(o) A Ci(0)) > Vy (o = newi) 


new;€E Obj N, 


In case of a lower bound l+, we select a set Obj}! C Obj, (if Obj’, is defined) 
such that l; = |Obj;.|. We then generate a FOL expression which specifies that 
all constants in Obj} must also satisfy object(o) and C;(o): 

vis =Vo: Vy (o = new;) = (object(o) A Ci(0)) 


new; EObjN 


Uniqueness: For every model object mapped to a FOL constant c;, we 
must generate formulae to ensure that it is distinct from other objects. These 
formulae are only generated in the case where a scope is defined. Assuming that 
an ordering is defined for all n constants c;, we generate n — 1 FOL constraints 
with increasing value of j <n: g¥M(j,n) = Neizh Cj FG. 


4.2 Mapping Relations Between Metamodel Types 


Once type-related constraints are mapped into FOL formulae, relations between 
these types are mapped as binary predicates. 

Type Compliance (TC) Relations between classes and class attributes are 
mapped into FOL in the same way (see section 2). Each relation and attribute 
is mapped to a FOL predicate R;(01, 02). When mapping relations, we must 
ensure that the endpoint objects are type-compliant with the metamodel: for 
each R;(01, 02) that points from a class Cı to a type C2, we generate a formula 


pre = Vo1, 02 : Ri(01, 02) => (Ci(01) A C2(02)). 


Note that for the purpose of this specific mapping, inverse relations are con- 
sidered as two separate unidirectional relations. Figure 8 contains an example of 
such a case, with the corresponding TC mapping. 
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Multiplicities (MUL) As the multiplicity of a unidirectional relation has a 
lower and an upper bound, at most two FOL formulae will be generated. Lower 
multiplicities of 0 and upper multiplicities of x do not generate any formulae. 

Lower Multiplicity : Consider the relation R;(a,b) from C;(a) to C;(b) which 
has a lower multiplicity m 4 0. We generate the constraint that for all objects a 
of type C;(a), there must exist at least m unique constants bo ...bm connected 
to C;(a) through a R;(a, b;) relation. The generated FOL constraint is: 


PMUL = Va: | Ci(a) > | dbo...bm: | AN Rila, bi) | A distinct(bo ... bm) 
b4:2=0 


Upper Multiplicity : Given the relation R;(a, b) introduced previously, let us 
consider an upper multiplicity of n 4 x. We generate the constraint that if there 
are n+ 1 objects bo ...bn+1 connected to an object a through R;(a, b;) relations, 
then there are at least 2 identical b; constants among bo... bn41. This means 
that bo... bn+1 are not pairwise distinct, formally distinct (bo ...by+1). 


n+l 
~Muu = Va,bo...bn4i: VAN Ri (a, bi) => ~distinct(bo ssi bn+1) 
b4:i=0 


Multiplicity formulae derived from a relation in Figure 1 are shown in Figure 8. 
Note the asymmetric nature of the two formulae: lower multiplicities are more 
difficult to satisfy for the prover as that might introduce an infinite model. 

Inverse Relations (INV) As mentioned earlier, we consider inverse re- 
lations as two separate (unidirectional) relations. The bidirectional nature of 
such relations implies that both of their corresponding unidirectional relations 
cannot exist without each other. Thus, we must ensure that for two objects 
a and b are connected by inverse relations R;(a,b) and R,(b,a) simultaneously: 
pi, = Va, b : Ri(a,b) 4 R;(b,a). An example can be seen in Figure 8. 

Containment Hierarchy (CON) Containment hierarchy is enforced by 
the following constraints (see Figure 8 for examples): 


— Union of containment edges: We first define a disjunction contains(o1, 02) of 
all containment relations R._;(01, 02) in the metamodel. The generated FOL 
formula is p¥O4 = Vo1, 02 : contains(o1, 02) + Vre: Re—;(01, 02). 

— Existence of a unique root constant: We define a unique constant root as an 
object that is not contained: p45 =Vr,o: (r = root = —contains(o,r)). 

— Container Object: We must ensure that every non-root object in the gener- 
ated model is contained by another object. Thus, any constant o that satisfies 
object(o) is either the root constant root or is contained by another constant. 
Formally, p44 = Vo : object(o) > (o = root V Ap: contains(p, o)). 

— Single Container: We must also ensure that any constant o is contained by at 
most one other constant. Thus, if o is contained by two constants pı and po, 
then pı and pz are identical. Formally, p4X, = Vo, pi, p2 : (contains(p1, 0) A 


contains(p2,0)) => (pı = p2). 
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regions 
: > i i 
EMF sc1: Statechart ri: Region 
obj—const: scl +> oldı,r1 > oldg 


Vo : SChart (0) + (SChart? (0) A ~SChart* (0)) V (=SChart° (0) A SChart™ (0)) 
FOL | Vo : Region(o) + (Region? (0) A —Region™ (0)) V (=Region®? (0) A Region” (0)) 
PS-C: Vo: Statechart°(0) 6 o = oldı, Vo: Region? (0) 0 = old2 
PS-R: regions (old, old2) 

Fig. 9: Mapping instance models 


Avoidance of Cyclic Containment (CYC) Unfortunately, FOL is not 
expressive enough to capture formulae required to avoid cyclic containment re- 
lations (an example is shown in Figure 2b) in the output models. Therefore, we 
generate approximated constraints to avoid cycles up to length n given as an 
input parameter. For that purpose, we derive separate formulae for each length x 
(with 0 < x < n) using the contains(o1, 02) predicate defined in yX% . Formally, 


pM (s) = 301... Ox : TE contains(oi, oi41)) A^ contains(0, 00). 


4.3 Instance model mapping 


When mapping an instance model P = (Op,Zp) as a partial snapshot, we 
transform its objects Op = {01,...,0n} to a set of constants Constp = 
{old,,..., oldn} while maintaining a trace map t : Op —> Constp. Addition- 
ally, all classes C which have an instance in the instance model are split into 
two categories: C? and C™ that differentiate the old (i.e. old,,..., oldn) and new 
objects (generated by the solver). Finally, if a class predicate C; is true in the 
partial model Zp(C;)(o) = 1, then it must be true in the generated model too, 
which is enforced by formula C?(t(o)). Similarly, if a reference predicate Rj is 
true in the partial model Zy,(R;)(01, 02) = 1, then it also must be true in the 
generated model, which is enforced by formula R,(t(01), t(02)). 

A sample generated FOL formulae for an instance model is shown in Figure 9. 


4.4 Mapping additional constraints 


The modeling environment of our approach may contain additional FOL patterns 
and WF constraints defined in the Viatra Query Language (VQL). The header 
of each VQL pattern taking n parameters as input is mapped to a predicate 
ph,(v1 ...v,). The pattern body is mapped into a FOL statement Ypei (v1... Un) 
according to its FOL content such that if a set of n variables satisfy the associated 
pattern header predicate, it must also satisfy the specifications described in 
Gee Uy -- - Un): piq = Wii... Un Ppa (tt) > Cpa ltl- -Un )- 

For patterns that are specified as WF constraint, an additional FOL for- 
mula is generated to ensure that such patterns does not matching in the gen- 
erated model. Structurally, the corresponding FOL formula checks that no 
objects v1... Un satisfies the condition of the pattern: pWfo = WU... Un : 
aph, (v1 ... Vn). Figure 10 shows the mapping for patterns specified in Figure 4. 
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pattern transition(t,src,trg){|@Constraint 


VQL Transition.source(t,src); pattern incomingToEntry(t, e:Entry){ 
Transition.target(t,trg); } find transition(t,_,e); } 
WF1-TRA.: Vt, src, trg : transition(t, src, trg) > 


source(t, src) A target(t, trg) 
WF 1-ITE: Vi, e : ite(t, e) > (Entry(e) A (3s : transition(t, s,e)) 
WEF2-ITE:  Vt,e: (Transition(t) A Entry(e)) > —ite(t, e) 
Fig. 10: Mapping VQL patterns and WF constraints 


FOL 


5 Evaluation 


We conduct several measurements to address the following research questions: 


RQ1: Which TPTP-compliant theorem provers are most scalable wrt. model 
size and runtime of model generation? 

RQ2: How do theorem provers scale compared to other logic solvers for a model 
generation scenario? 


Target domain: To address these questions, we perform model generation sce- 
narios and analyze the results in the context of the Yakindu Statecharts industrial 
modeling environment introduced in section 2.1. We use the metamodel shown 
in Figure 1, which contains 13 classes, including an enum class, and 6 refer- 
ences. Moreover, the Yakindu metamodel covers all mapping rules introduced 
in section 4. We also formalize 17 WF constraints as graph predicates to fur- 
ther restrict the model generation scope. Finally, we provide an initial instance 
model as a seed for model generation which contains only a single root node, 
thus the underlying solvers have full responsibility in model generation. Exam- 
ples of input and output files as well as our measurement results are on GitHub. 
Altogether, Yakindu Statecharts provide a sufficiently complex case to assess the 
proposed mapping and the underlying theorem provers, and it has been used as 
a case study in existing papers of model generation [27,30]. 


5.1 Research Question 1 (RQ1) 


Measurement setup: We compare the scalability of all TPTP-compliant the- 
orem provers available on the System on TPTP* website, which is the official 
TPTP web interface for solving FOL problems for theorem proving competi- 
tions. System on TPTP lists 73 solvers and 87 different solver configurations 
that can be called directly on their servers® through HTTP requests. 

Our experimentation consists of three phases. For all three phases, we gen- 
erate constraints to avoid containment cycles of up to 5 objects, which is a 
parameter used in existing research such as [28]. 

PHASE I: As a preliminary step, we attempt to generate a small model con- 
taining 9-10 nodes within a time limit of 1 minute with each listed TPTP-prover. 


* https: //github.com/ArenBabikian /publication-pages/wiki/ 
Automated-Generation-of-Consistent- Graph-Models- with- Theorem-Provers 

5 http://tptp.cs.miami.edu/cgi-bin/SystemOnTPTP 

6 Intel Xeon CPU E5-4610 2.40GHz, 128GB RAM, Linux 3.10.0 
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Note that from the 9-10 output nodes, 3 nodes are enforced by the enum map- 
ping, 1 node is defined in the initial model and 5-6 nodes must be generated 
by the theorem prover. We perform this experimentation three times and we 
manually analyze the output. If a theorem prover is unable to read the input 
TPTP problem or is incapable of generating a finite model according to the 
specifications, it is disqualified for the subsequent two steps of our workflow. 

PHASE IT: This phase involves small-size model generation to further elim- 
inate weak TPTP solvers. For each qualified solver, we generate finite models 
with increasing size (starting from 5 objects as a lower bound, with a step size of 
5 objects). We set a timeout of 1 minute for each generation run. We execute each 
generation run 10 times and take the median of the execution times of successful 
runs (i.e. that provide a finite model as result within the given timeout). 

We also measure the ratio of failed runs for each model size. We end the 
sequence of model generations for a given solver if all 10 runs at a same size 
specification fail to output a finite model. Considering that we are running the 
measurements on a server, we cannot influence warm-up effects and memory 
handling. After this second phase, we keep the (four) best performing solvers. 

PHASE III: We complete our experimentation by performing large-scale 
model generation. For this phase, we perform the same data collection as for 
PHASE IT. However, we begin model generation at a size of 30 objects and use 
step size of 10 objects. Furthermore, we use a timeout of 5 minutes and we 
perform each generation run 20 times. 


Scalability in model size: We compare model size derived by TPTP solvers. 

PHASE I: Among the 87 prover configurations provided on the TPTP server, 
only 8 configurations were able to generate models with 9-10 objects, namely 
CVC4 (SAT-1.7), DarwinFM (1.4.5), E-Darwin (1.5), Geo-Ill (2018C), iProver (SAT- 
3.0), Paradox (4.0), Vampire (SAT-4.4) and Z3 (4.4.1). The MACE2 (2.2) prover also 
claimed generating a finite model for the given inputs. However, after manual 
analysis of the output, no generated finite model was found. As a result, we 
decided to drop MACE2 from the following measurement phases. 

PHASE II: Figure lla presents the complete measurements for scalability 
analysis of the 4 least scalable remaining solver configurations. PHASE II results 
for the 4 more scalable solver configurations are included in Figure 11b, along 
with their results for PHASE III. Figure 11a contains the median runtime (as 
provided by the server) of successful model generations wrt the size of the gen- 
erated model while the runtime required for the mapping itself is excluded (as 
it is negligible). Measurements for PHASE II are performed for models of up to 
25 objects, while measurements for larger models correspond to PHASE III. 

Figure 11c presents the ratio of failed model generation runs wrt. model 
size. When all runs fail in generating models, the failure ratio becomes 1 and 
no further model generation runs are performed. Notice that solvers CVC4, Dar- 
winFM, E-Darwin and Geo-IIl are unable to generate models of 30 objects within 
the 1-minute timeout period, thus they are excluded from further experiments. 

PHASE III: Figure 11b shows that iProver and Z3 dominate in terms of scala- 
bility. There exists a steady increase in runtime with respect to generated model 
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Fig. 11: Results of PHASE IT and PHASE III measurements (incl. failure rates) 


size, however, we notice certain inconsistencies when failure rates increase as 
the generated models become larger. Both solvers can generate models of 140 
objects: iProver can do so at a faster rate, however, Z3 does so more consistently 
with respect to failures. Moreover, it is interesting to see that existing model gen- 
eration approaches that used Z3 as an underlying solver [28,32] report inferior 
results with respect to the size of generated (fully connected) models. 

The Paradox solver provides very fast model generation for models of up to 
110 objects. Although failure rates are high for large models, by inspecting the 
measurement data, we notice that Paradox explicitly reports (within timeout) 
that it is unknown if a model can be generated for the given input. 

Scalability of the Vampire solver lacks in comparison to the other solvers. We 
observe an interesting pattern in failure rates for Vampire: the solver fails often 
when generating not only large models, but also very small models. In fact, 
analysis of measurement data shows that in these cases, Vampire states that the 
input constraints are satisfiable, but it does not generate a finite model. This 
behavior is similar to that of Paradox, since failures are not caused by timeouts. 


Runtime of solvers: Runtime differences between solvers are negligible for 
generated models of size 20 and under. For models larger than 20 nodes, Paradox 
was the fastest solver as highlighted in Figure 11b. For models with 120 objects 
or more, iProver is slightly faster than Z3. However, increased failure rates for 
iProver make the measured median values less reliable than those of Z3. 

RQ1: Only 9% (8/87) of theorem prover configurations presented in the Sys- 
tem on TPTP website are able to generate small models. Only 4 configurations 
can generate larger models containing 30 nodes. iProver and Z3 are the most 
scalable provers and are able to generate models of 140 nodes, while Paradox is 
significantly faster than other solvers for models of up to 110 nodes. 
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Fig. 12: Results of RQ2 measurements, including failure rates. 


5.2 Research Question 2 (RQ2) 


Measurement setup: We compare the model generation scalability of the Vam- 
pire (4.4) theorem prover to that of two other approaches that use Alloy (4.2) [13] 
and VIATRA Solver [27,30] as back-end solvers, respectively. We select Vampire 
for our experimentation as it is the most scalable theorem prover that we are 
able to run locally using generated TPTP files as input. We use the most recent 
stable releases of the solvers to generate graphs of increasing size (starting from 
models with exactly 20 objects, and an increment of exactly 20 objects). 

We generate constraints to avoid containment cycles of up to 5 objects and 
we set a timeout of 5 minutes. We execute 20 runs per generated graph size and 
take the median of the execution times of successful runs (i.e. that provide a 
finite model as result within the given timeout). To account for warm-up effects 
and memory handling of the Java 8 VM, we add an extra 5 runs before the 
actual measurements and call the garbage collector explicitly between runs. We 
perform measurements on an average personal computer’ with local installation 
of solvers. We end the sequence of model generations if none of the 20 runs at a 
same size specification provide a generated finite model. 


Scalability in model size: Figure 12a presents the scalability measurements 
for the Vampire, Alloy and VIATRA solvers. Figure 12b presents the corresponding 
failure rates. VIATRA was able to generate models of up to 1380 objects, but data 
points are shown in Figure 12a and Figure 12b for models only up to 180 nodes. 
We notice that our mapping using the Vampire solver slightly outperforms Alloy, 
but both approaches are significantly outperformed by the VIATRA-solver, which 
is coherent with previous research results [30]. The variation in Vampire perfor- 
mance (cf. Figure 11b and Figure 12a) is attributed to the different measurement 
environments and Vampire versions used to assess each research question. 

RQ2: Using Vampire as a back-end solver, our approach scales for 20% larger 
models with less failures compared to an Alloy-based approach, but it is outper- 
formed by the VIATRA-based approach. 


7 Intel Core i7-8550U CPU@1.80GHz, 16 GB RAM, Windows 10, Java 1.8, 8 GB Heap 
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5.3 Threats to validity 


Internal Validity: The measurements for RQ1 are performed on a server 
that acts as a black box with regards to our experimentation. We mitigate this 
threat by using the same server for the entirety of RQ1 experimentation. Nev- 
ertheless, we take the server runtime output as is for our experimentation. We 
cannot perform further analysis regarding potential warm-up time and garbage 
collection, which is mitigated for the experimentation of RQ2. Furthermore, we 
make comparison between our approach and others that use the same back-end 
solvers (namely, Z3) for model generation. However, we must be aware of the 
different measurement setups used for each implementation. 

External Validity: Our approach is limited to a single domain selected 
based on its past use in related lines of existing research [27, 29, 30, 37]. The 
domain of Yakindu Statecharts is sufficiently complex to cover all features of 
our mapping, thus we expect similar scalability results in other domains. 

Construct Validity: For RQ1, we specify a scope ranging from 9 to 10 
objects for PHASE I, while we only provide a lower-bound scope specification for 
the other phases. As for RQ2, we ask for an exact number of generated objects. 
These scope specifications may be disadvantageous for certain solvers (e.g. Alloy, 
if no upper bound is specified). We mitigate this threat by staying consistent in 
scope specifications throughout a research question or phase. 


6 Related work 


We provide an overview of various graph generation approaches that derive con- 
sistent graphs. 


Model generators using back-end logic solvers: These approaches trans- 
late graphs and WF constraints into logic formulae and use a logic solver to gen- 
erate graphs that satisfy them. EMF2CSP/UML2CSP [8,10] translates model 
generation to a constraint programming problem, and solves it by use of an 
underlying CSP solver. ASMIG [38] uses the Z3 SMT solver [22] to generate 
typed and attributed graphs with inheritance. An advanced model generation 
approach is presented in the Formula framework [15] also using the Z3 SMT 
solver. AutoGraph [26] generates consistent attributed multidimensional graphs 
by separating the generation of the graph structure and the attributes. Graph 
generation is driven by a tableau approach, while attribute handling uses the 
Z3 SMT-solver. [28] proposes a mapping of EMF models enriched with derived 
features for the formal validation of DSLs. Model generation for this purpose is 
performed by using Z3 and Alloy as backend solvers. 

Logic-solver based generators do ensure consistency and they can also detect 
inconsistencies in a specification. However, their scalability is comparable to 
our approach. In fact, we managed to improve scalability of model generation 
compared to results reported in [28] using Z3 as a back-end solver. 


Custom consistent model generators: Cartesian genetic programming 
(CGP) [21] encodes graphs with linear or grid-based genotypes and produces new 
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ones by evolving the initial graph, originally used to produce electronic circuits. 
Recent work [3, 4] introduces evolving graphs by graph programming, CGP’s 
generalization to arbitrary graphs. However, consistency of models is addressed 
only on a best-effort basis, i.e. there is no formal guarantee of consistency. 

SDG [31] proposes an approach that uses a search-based custom OCL solver 
to generate synthetic data for statistical testing. Generated models are multi- 
dimensional and consistent. The study claims scalability by generating a large 
set of small models. Research in [32] proposes a hybrid approach that uses both 
a meta-heuristic search-based OCL solver [2] for structural constraints and an 
SMT solver for attribute constraints, based on the snapshot generator of the 
USE framework [9]. Generated typed models are (locally) consistent and large, 
but not fully connected (a large family of small models are generated). The VIA- 
TRA graph solver [30] is able to generate large and consistent (fully connected) 
models by lifting SAT solving algorithms to the level of graphs, and exploiting 
partial modeling techniques. 

Custom approaches are more scalable than our approach, but the incon- 
sistency of a DSL specification cannot be detected, thus, there is no graceful 
degradation in the case when no consistent models can be derived. 


7 Conclusion and Future Work 


In this paper, we provided a mapping of DSL specifications consisting of an EMF 
metamodel and well-formedness constraints into first-order logic formulae to be 
fed into TPTP-compliant theorem provers. As such, we successfully integrated 
more than 70 different theorem provers for model generation purposes. However, 
our scalability evaluation of these theorem provers carried out in the scope of 
an industrial DSL tool revealed that most of those provers cannot be effectively 
used for model generation purposes — not even for very small models. While 
these solvers can potentially be efficient in detecting inconsistencies of FOL 
specifications, our experiments revealed that a different solver profile would be 
beneficial for model generation purposes despite the similarity in the underlying 
logic formalization. On the positive side, our mapping improved scalability when 
using Z3 as a back-end theorem prover for model generation purposes. 

As we obtained negative scalability results for the vast majority of theorem 
provers, we believe that our case study can serve as an interesting benchmark 
case for future TPTP competitions as part of future work. Moreover, we plan 
to better exploit that theorem provers when no models can exist due to in- 
consistencies regardless of model size by combining calls to TPTP solvers with 
custom graph model generation techniques. In this case, TPTP solvers may be 
able to highlight a minimal set of unsatisfiable elements, which can be checked 
subsequently during the exploration to prevent inconsistent dead ends. 
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Abstract. To model real-world software systems, modelling paradigms 
should support a form of compositionality. In interface theory and model- 
based testing with inputs and outputs, conjunctive operators have been 
introduced: the behaviour allowed by composed specification sı ^ s2 is 
the behaviour allowed by both partial models sı and s2. The models 
at hand are non-deterministic interface automata, but the interaction 
between non-determinism and conjunction is not yet well understood. 
On the other hand, in the theory of alternating automata, conjunction 
and non-determinism are core aspects. Alternating automata have not 
been considered in the context of inputs and outputs, making them less 
suitable for modelling software interfaces. In this paper, we combine the 
two modelling paradigms to define alternating interface automata (AIA). 
We equip these automata with an observational, trace-based semantics, 
and define testers, to establish correctness of black-box interfaces with 
respect to an AIA specification. 


1 Introduction 


The challenge of software verification is to ensure that software systems are cor- 
rect, using techniques such as model checking and model-based testing. To use 
these techniques, we assume that we have an abstract specification of a system, 
which serves as a description of what the system should do. A popular approach 
is to model a specification as an automaton. However, the huge number of states 
in typical real-world software systems quickly makes modelling with explicit au- 
tomata infeasible. A form of compositionality is therefore usually required for 
scalability, so that a specification can be decomposed into smaller and under- 
standable parts. Parallel composition is based on a structural decomposition of 
the modelled system into components, and it thus relies on the assumption that 
components themselves are small and simple enough to be modelled. This as- 
sumption is not required for logical composition, in which partial specification 
models of the same component or system are combined in the manner of log- 
ical conjunction. Formally, for a composition to be conjunctive, the behaviour 
allowed by sı A s2 is the behaviour allowed by both partial specifications sı and 
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S2. Such a composition is important for scalability of modelling, as it allows 
writing independent partial specifications, sometimes called view modelling [3]. 
On a fundamental level, specifications can be seen as logical statements about 
software, and the existence of conjunction on such statements is only natural. 
Conjunctive operators have been defined in many language-theoretic modelling 
frameworks, such as for regular expressions [12] and process algebras [5]. 


1.1 Conjunction for Inputs and Outputs 


A conjunctive operator A has also been introduced in many automata frame- 
works for formal verification and testing, such as interface theory [8], ioco the- 
ory [3] and the theory of substitutivity refinement [7]. Within these theories, 
systems are modelled as labelled transition systems [15] or interface automata [1] 
(IA), and actions are divided into inputs and outputs. 

An informal example of some (partial) specification models, as could be ex- 
pressed in these theories, is shown by the automata in Figure 1, in which inputs 
are labelled with question marks, and outputs with exclamation marks. The 
specifications represent a vending machine with two input buttons (?a and ?b), 
which provides coffee (!c) and tea (!t) as outputs, optionally with milk (!c-+m 
and !t+m). The first model, p, specifies that after pressing button ?a, the ma- 
chine dispenses coffee. The second model, q, specifies that after pressing button 
?b, the machine has a choice between dispensing tea, or tea with milk. The third 
model, r, is similar, but uses non-determinism to specify that button ?b results 
in coffee with milk or tea with milk. 

The fourth model, pA qr, states that all former three partial models should 
hold. Here, we use the definition of A from [3], but the definition from [7] is 
similar. An input is specified in the combined model if it is specified in any 
partial model, making both buttons ?a and ?b specified. Additionally, an output 
is allowed in the combined model if it is allowed by all partial models, meaning 
that after button ?b, only tea with milk is allowed. 


P q 
ae SE 


?a 


lc It 


Fig. 1. Three independent specifications for a vending machine, and their conjunction. 


1.2 Conjunctions of states 


This form of conjunctive composition acts as an operator on entire models. 
However, a partial specification could also describe the expected behaviour of 
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a particular state of the system, other than the initial state. For example, sup- 
pose that the input ?on turns the vending machine on, after which the machine 
should behave as specified by p, q and r from Figure 1. This, by itself, is also a 
specification, illustrated by s in Figure 2. However, the formal meaning of this 
model is unclear: transitions connect states, whereas p ^q Ar is not a state but 
an entire automaton. A less trivial case is partial specification t, also in Figure 2: 
after obtaining any drink by input ?take, we should move to a state where we 
can obtain a drink as described by specifications p, q, r and t. Thus, we combine 
conjunctions with a form of recursion. This cannot easily be formalized using A 
as an operator on automata, like in [3,7,8]. Defining conjunction as a composition 
on individual states would provide a formal basis for these informal examples. 


pAqJArAt 


Fig. 2. Two specifications with transitions to a conjunction. 


Conjunctions of states are a main ingredient of alternating automata [6], in 
which conjunctions and non-determinism alternate. Here, non-determism acts as 
logical disjunction, dually to conjunction. Because of this duality, both conjunc- 
tion and disjunction are treated analogously: both are encoded in the transition 
relation of the automaton. This contrasts the approach of defining conjunction 
directly on IAs, where non-determinism is encoded in the transition relation of 
the IA, whereas conjunction is added as an operator on IAs, leaving the duality 
between the two unexploited. In fact, the conjunction-operator in [3] even re- 
quires that any non-determinism in its operands is removed first, by performing 
an exponential determinization step. For example, model r in Figure 1 is non- 
deterministic, and must be determinized to the form of model q before pAq Ar 
is computed. This indicates that it is hard to combine conjunction and non- 
determinism in an elegant way, without understanding their interaction. 

Despite their inherent support for conjunction, alternating automata are not 
entirely suitable for modeling the behaviour of software systems, since they lack 
the distinction between inputs and outputs. In this respect, alternating automata 
are similar to deterministic finite automata (DFAs). Distinguishing inputs and 
outputs in an IA allows modelling of software systems in a less abstract way than 
with the homogeneous alphabet of actions of DFAs and alternating automata. 
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1.3 Contributions 


We combine concepts from the worlds of interface theory and alternating au- 
tomata, leading to Alternating Interface Automata (AIAs), and show how these 
can be used in the setting of a trace semantics for observable inputs and outputs. 
We provide a solid formal basis of AIAs, by 


— combining alternation with inputs and outputs (Section 3.1), 

defining a trace semantics for AIAs (Section 3.2), by lifting the input-failure 

refinement semantics for non-deterministic interface automata [11] to AIAs, 

— providing insight into the semantics of an AIA, by defining a determiniza- 
tion operator (Section 3.3) and a transformation between IAs and AIAs 
(Section 3.4), and 

— defining testers (Section 4), which represent practical testing scenarios for 
establishing input-failure refinement between a black-box implementation IA 
and a specification AIA, analogously to ioco test case generation [15]. 


The definition of input-failure refinement [11] is based upon the observation 
that, for a non-deterministically reached set of states Q, the observable outputs 
of that set are the union of the outputs of the individual states in Q, whereas the 
specified inputs for Q are the intersection of the inputs specified in individual 
states in Q. For conjunction, we invert this: outputs allowed by a conjunction 
of states are captured by the intersection, whereas specified inputs are captured 
by the union. In this way, our AlAs seamlessly combine the duality between 
conjunction and non-determinism with the duality between inputs and outputs. 

Proofs can be found in the extended technical report [10]. 


2 Preliminaries 


We first recall the definition of interface automata [1] and input-failure refine- 
ment [11]. The original definition of IAs [1] allows at most one initial state, but 
we generalize this to sets of states. Moreover, [1] supports internal actions, which 
we do not need. Transitions are commonly encoded by a relation, whereas we 
use a function. 


Definition 1. An Interface Automaton (IA) is a 5-tuple (Q,1,O,T, Q°), where 


— Q is a set of states, 

— I and O are disjoint sets of input and output actions, respectively, 

—T:Qx (IUO}) > P(Q) is an image-finite transition function (meaning 
that T (q, £) is finite for all q and £), and 

— Q° CQ is a finite set of initial states. 


The domain of IAs is denoted TA. For s € TA, we refer to its respec- 
tive elements by Qs, Is, Os, Ts, QL. For 81, 82,...,8A4,8B,--. a family of IAs, 
we write Qj, Ij, Oj, Tj and QS to refer to the respective elements, for j = 
E Zren A Biria 
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In examples, we represent IAs graphically as in Figure 1. For the remainder 
of this paper, we assume fixed input and output alphabets J and O for IAs, with 
L = I UO. For (sets of) sequences of actions, x denotes the Kleene star, and € 
denotes the empty sequence. We define auxiliary notation in the style of [15]. 


Definition 2. Lets € TA, QC Qs, q, E€ Qs, LEL anda € L*. We define 


qsg eq=q q sd SEQ: Ber Ad E Ts(r,8) 
qs SAE Qsiqer gers © =la 7s) 
traces,(q) = {0 € L* | q S6} Q after, o = {r € Qs | dr E€ Q: r! Sr} 
traces(s) = U traces, (q) s after o = Q? after, o 
qEQS 


out,(Q) = {x €O|AqEQ:¢ Z} ins(Q)= {aE I| WEQ:q Z} 


q is a sink-state of s == We L: T:(q, 4 C {q} 
s is input-enabled = Vq € Qs : ins(q) =I 


s is deterministic — > Vo € L*,|s after o| < 1 


We omit the subscript for interface automaton s when clear from the context. 

We use IAs to represent black-box systems, which can produce outputs, and 
consume or refuse inputs from the environment. This entails a notion of observ- 
able behaviour, which we define in terms of input-failure traces [11]. 


Definition 3. For any input action a, we denote the input-failure of a as a. 
Likewise, for any set of inputs A, we define A = {a | a € A}. The domain of 
input-failure traces is defined as FT 1,90 = L* UL*-I. Fors € TA, we define 


Ftraces(s) = traces(s) U {oa | o € L*,a € I,a ¢ in(s after o)} 


Thus, a trace oa indicates that ø leads to a state where a is not accepted, 
e.g. a greyed-out button which cannot be clicked. 

Any such set of input-failure traces is prefix-closed. Input-failure traces are 
the basis of input-fatlure refinement, which we will now explain briefly. This 
refinement relation was introduced in [11] to bridge the gap between alternating 
refinements [1,2] and ioco theory [15]. Similarly to normal trace inclusion, the 
idea is that an implementation may only show a trace if a specification also 
shows this trace. Moreover, the most permissive treatment of an input is to fail 
it, so if a specification allows an input failure, then it also must allow acceptance 
of that input, as expressed by the input-failure closure. 


Definition 4. Set S C FT 1,0 of input-failure traces is input-failure closed if, 
for allo € L*,a € I andp € FTro, ca € S = cap € S. The input- 
failure closure of S is the smallest input-failure closed superset of S, that is, 
fcl(S) = SU {oap | oa € S,p E FTro}. 
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Input-failure refinement and input-failure equivalence on IAs are respectively 
defined as 


Sı Sif s2 <=> Ftraces(s;) C fel(Ftraces(s2)), and 
S1 Zif $2 <> SI Sif S2 N 82 Sif S1. 


The input-failure closure of the Ftraces serves as a canonical representation 
of the behaviour of an IA. That is, two models are input-failure equivalent if 
and only if the closure of their input-failure traces is the same, as stated in 
Proposition 5. 


Proposition 5. /11] Let sı,s2 ETA. Then 


Sı Lif s2 <=> fcl(Ftraces(s1)) C fel(Ftraces(s2)) 
81 = S2 <> fcl(Ftraces(s1)) = fel(Ftraces(s2)) 


Proposition 5 implies that relation <j is reflexive (s <j s) and transitive 
(s1 Sif 82 A S2 Sif S3 => Sı <i 83). Formally, it is thus a preorder, making it 
suitable for stepwise refinement. 


3 Alternating Interface Automata 


Real software systems are always in a single state, but the precise state of a sys- 
tem cannot always be derived from an observed trace. Due to non-determinism, 
a trace may lead to multiple states. In IAs, this is modelled as a set of states, 
such as the set of initial states, the set T(q,@) for state q and action £, and the 
set s after ø for IA s and trace o. The domain of such non-deterministic views 
on an IA with states Q is thus the powerset of states, P(Q). In set of states Q, 
traces from any individual state in Q may be observed. 


3.1 Alternation 


Alternation generalizes this view on automata: a system may not only be non- 
deterministically in multiple states, but also conjunctively. When conjunctively 
in multiple states, only traces which are in all these states may be observed. 
Alternation is formalized by exchanging the domain P(Q) for the domain D(Q). 
Formally, D(Q) is the free distributive lattice, which exist for any set Q [14]. 


Definition 6. For any set Q, D(Q) denotes the free distributive lattice gener- 
ated by Q. That is, D(Q) is the domain of equivalence classes of terms, induc- 
tively defined by the the grammar 

e = T | | (q) | erVee | erAes with ¢€ Q, 


where equivalence of terms is completely defined by the following axioms: 
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e1 Veg = 6&2 V &1 ey A &2 =n A € [Commutativity] 
e1 V (e2 V e3) = (e1 V e2) V e3 €A (e2 Aes) = (e1 A e2) ^ e3 [Associativity] 
e1 V (e1 A €2) = e1 e1 A (e1 V e2) = €1 [Absorption] 
eVe=e e\e=e [Idempotence] 

e1 V (e2 Aes) = (e1 V e2) A (e1 V e3) e1 A (e2 V e3) = (e1 A ea) V (e1 Aes) 
[Distributivity] 
eVT=T eAL=L [Identity] 


In short, (D(Q), V, A, L, T) forms a distributive lattice. Expression (q) is 
named the embedding of q in D(Q), and operators V and ^ are named disjunc- 
tion and conjunction, respectively. For the remainder of this paper, we make no 
distinction between expressions and their equivalence classes. 

For finite n, we introduce the shorthand n-ary operators V and N, as follows: 


\/{e1,€2,-..€n} = €1 Veg V.--€n VV 
[Nier e2,- En} = e1Ae2A...€n \ 


We distinguish the embedding (gq) E€ D(Q) from q itself. We require this 
distinction only in Definition 18, where we will point this out. Otherwise, we do 
not need this distinction, so we write q instead of (q). 

Intuitively, disjunction qı V q2 replaces the non-deterministic set {q1, q2}. This 
is formalized by extending IAs with alternation. 


II 


i) 
) 


II 


Definition 7. An alternating interface automaton (AIA) is defined as a 5-tuple 
(Q,I,O,T, e?) where 


— Q is a set of states, and elements of D(Q) are referred to as configurations, 

— I and O are disjoint sets of input and output actions, respectively, 

—T:Q~x (LUO) > D(Q) is a transition function, with T(q,a) £ L for all 
a€ lI, and 

— e € D(Q) is the initial configuration. 


The domain of AIAs is denoted by ATA. Notations for IAs are reused for 
AIAs, if this causes no ambiguity. For € € L, we define Ty : Q + D(Q) by 
Tela) =T (q, £). 


Configurations T and L are analogous to the empty set of states in an IA 
s: if Ts(q,£&) = 0, this means that state q does not have a transition for 2. In 
terms of input-failure refinement, not having a transition for an input means 
that the input is underspecified, whereas not having a transition for an output 
means that the output is forbidden. This distinction is made explicit in AIA by 
using T to represent underspecification and L to represent forbidden behaviour. 
We will formalize this in Section 3.2. Definition 7 also allows output transitions 
to T, meaning that the behaviour is unspecified after that output. Automata 
models which do not allow distinct configurations T and L commonly represent 
such underspecified behaviour with an explicit chaotic state [3,4] instead. 
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We graphically represent AIAs in a similar way as IAs, with some additional 
rules. A transition T(q°, £) = (q!) is represented by a single arrow from q° to q!. 
We represent T(q°, 2) = q! V q? by two arrows q? £ @ and œ $ ¢?, analogous 
to non-determinism in IAs. Conjunction T(q°, £) = q! Aq? is shown by adding an 
arc between the arrows. Nested expressions are represented by successive splits, 
as shown in Example 8. A state q without outgoing arrow for an output £ € O 
represents T(q, £) = L, and a state without input transitions for input £ indicates 
T(q, £) = T. For £ € O, a transitions T(q, 2) = T is shown with an arrow to T, 
denoting underspecification, but note that T is a configuration, not a state. 


Example 8. Figure 3 shows AIA s4, with Qa = {49,44,43}, I = {?a,?b}, 
O = {!x,!y}, e% = qù and T given by the following table: 


action 
stalig ?a ?b Ix ly 
a Araya T A A 
qh i L 
2 T Qa 12 å 


A 
Moreover, AIA sg combines the partial specifications from Section 1. 


ly Ix 


Fig. 3. Example AIAs s4 and sz. 


Before defining trace semantics for AIAs, we extend the transition function 
from single actions to sequences of actions, by defining an after-function on AIAs. 
This function transforms configurations by substituting every state according to 
the transition function, similarly to the approach for alternating automata in [6]. 
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Definition 9. Let f : Q > D(Q) ande € D(Q). Then substitution e[f] is equal 
to e with all atomic propositions replaced by f(e). Formally, |f] :D(Q) > D(Q) 
is a postfix operator defined by 
(e1 V e2)[f] = e1[f] V e2[f] (e1 A e2)[f] = axl] A e2[f] 

TIA=T Lf =L Dl] = f@ 


Definition 10. Let s € ATA. We define after: D(Q,) x L* > D(Q,) as 
e afters € =e e afters (l-o) = e|Te] afters o 


Like before, we omit the subscript if clear from the context. We also define 
(s after o) = e? after, o. 


Example 11. Consider sg in Figure 3. We evaluate sg after ?on ?b lt, as follows: 
q% after ?on?b!t = g}[Ton] after ?b!t = T (q9, ?on) after ?b!t 
= (q ^ qh ^ dh ^ Gp) after ?b lt = (qp A^ dh A ap A dR) [To] after !t 


( 
= (T Aq A (qh V dh) A qB) after !t = (q8 A (ah V ab) A a) [Tit] 
(TA(LvVL)A^gp)=L 


Intuitively, this means that giving a tea without milk after ?on ?b is forbidden. 
In contrast, tea with milk is allowed, and leads to configuration q}: 


q% after ?on ?b!t-+m = (q% A (q$ V qb) Ag) [Tim] = TA(LVT)A ae? = q 


3.2 Input-Failure Semantics for AIAs 


IAs are equipped with input-failure semantics, based on the traces and under- 
specified inputs of the IA. We lift this to AIAs via the after-function, using that 
L indicates forbidden behaviour, and T indicates underspecified behaviour. 


Definition 12. Let s,s’ € ATA, and e € D(Qs). Then we define 


Ftraces,(e) = {o € L* | (e after, o) A L} U{oaeE L* -T | (e after, ca) = T} 
Ftraces(s) = Ftraces,(e?) 

s Sips! <= Ftraces(s) C Ftraces(s’) 

s =f s! <> Ftraces(s) = Ftraces(s’) 


Compare Definition 4 and Definition 12 for input-failure refinement for IAs 
and for AlAs. For AIAs, refinement is defined directly over their Ftraces, whereas 
for IA, the input-failure closure of the Ftraces is used for the right-hand model 
(and optionally for the left-hand model, according to Proposition 5). In this 
regard, AIAs are a more direct and natural representation of input-failure traces, 
since the input-failure closure is not needed. 


Proposition 13. For s € ATA, Ftraces(s) is input-failure closed. 
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Another motivation to represent input-failure traces with AIAs is the connec- 
tion between the distributive lattice D(Q) and the lattice of sets of input-failure 
traces: A and V are connected to intersection and union of input-failure traces, re- 
spectively, and T and L represent the largest and smallest possible input-failure 
trace sets. 


Proposition 14. Let s € ATA, and e,e' E D(Q;). Then 


Ftraces(e A e’) = Ftraces(e) N Ftraces(e’) 
Ftraces(e V e’) = Ftraces(e) U Ftraces(e’) 
Ftraces( L) = Ø 
Ftraces(T) = FT 1,0 
Ftraces(e) = {e} U {a € I, | e after a = T} 
U (User, £: Ftraces(e after £)) ife L 


Sr SSeS. 


Propositions 14.3 and 14.5 show why Definition 7 does not allow transitions 
to T(q,a) = L for an input a: in that case, Ftraces(q) would contain trace <€, but 
it would not contain extension a nor @ of e, meaning that after trace e it is not 
allowed to accept nor to refuse a. 

We can lift configurations T and L, as well as ^ and V, to the level of AIAs. 
This provides the building blocks to compose specifications. Specifications s+ 
and sı can be used to specify that any or no behaviour is considered correct, 
respectively. The operators A and V on specifications fulfill the same role as 
existing operators in substitutivity refinement [7], and have similar properties, 
described in Proposition 14. 


Definition 15. Let 81,52 E€ ATA. Without loss of generality’, assume that Qı 
and Qz are disjoint. We define 


st = (0,1,0,0,T) 81 A 82 = (Qı U Q2, 1,0, Ti U To, ef A e9) 
S1 = (0,1,0,0, L) S1 V S2 = (Qi U Qe,I,0,T; U To, ef V e9) 


Proposition 16. Let i,i’,s,s' € ATA. Then 


i <ig 8 andi Lips 4 i <i (8A 58’) 
(Sys ort sys => i< (sVs) 
i<iys andi’ Sips 4> (iVi) <#s 
i<ys ori Sips => (GA?) <s 
i Sig ST 
i Lif sı ife, AL 
The converse of statement (2) does not hold. As a counter-example, choose 
Ftraces(i) = {e, x,y}, Ftraces(s1) = {e,x} and Ftraces(s2) = {e,y}. In that 


case, i <j S1 V s2 holds, but i Żif sı and i iş s2. The converse of statement (4) 
can be disproven similarly. 


1 If Qı and Qə are not disjoint, the disjoint union Qı W Q2 can be used instead of 


Q1UQz2. The transition functions of s1/s2 and sı V s2 should be adjusted accordingly. 
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3.3 AIA Determinization 


In case of nestings of A and V, the after-set s after o may not be clear im- 
mediately, so a transition function producing configurations without A and V 
is easier to interpret. For this reason, we lift the notions of determinism and 
determinization from IAs [11] to the alternating setting. 


Definition 17. Let s € ATA and e € D(Q;). Then e is deterministic if e = T 
ore = L ore = (q) for some q € Qs. Furthermore, s is deterministic if for all 
o € L*, configuration s after o is deterministic. 


Compare the notions of determinism for IAs and AIAs. For every trace o, a 
deterministic IA s is in a singleton state (s after o) = {q}, unless (s after o) = 0) 
(that is, ø is not a trace of s). For AlAs, this singleton set {q} is replaced by 
the embedding (q), and @ is replaced by T or L, depending on whether this set 
was reached by an undespecified action or a forbidden action. 

We now define determinization, where we require the distinction between (q) 
and q to avoid ambiguity. 


Definition 18. Let s € ATA. We define det : D(Q,) > D(D(Q;) \ {T, L}) as 
ife=T 
det(e) = ¢ L ife=1 


(e) otherwise 


The determinization of s, or det(s) € AZA, is defined as 


det(s) = (D(Qs) \ ae L}, I, O, Taet(s) det(e9)), with 
Taet(s) (€, £) = det (e afters £) forleL 


Proposition 19. For s € ATA, det(s) is deterministic. 


Example 20. Figure 4 shows (the reachable part of) the determinizations of s 4 
and sg from Figure 3. In det(s4), state qù A q3 has no outgoing !x-transition. 
This expresses Tyct(s ,) (44 \q4;!x) = L, which is because q% has no «-transition, 
so Ta(q%,!x) = L. In contrast, state qù A q4 has an outgoing ?a-transition, 
Taet(sa) (G4 A qå, 2a) Æ T, because qù has an ?a-transition, Ta (q), ?a) 4 T. 


Example 20 shows that an input is specified by a conjunction of states in 
the determinization if any of the individual state specify this input, whereas an 
output is allowed by a conjunction of states only if all of the individual state allow 
this output. In the setting of IA, [11] already established that this works in a 
reversed way for non-determinism, following their definition of determinization: 
all individual states of a disjunction should specify an input to specify it in 
the determinization, and any individual state should allow an output to allow 
it in the determinization. Their so-called input-universal determinization is an 
instance of the determinization from Definition 18, using only disjunctions. 
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det(ss) 


>@) 


(ais A (a8 V ab) A a3) 


Ga 


Fig. 4. Examples of determinization. 


This duality arises from Definition 10 of after, since the determinization 
directly represents the after-function: the determinizations in Example 20 corre- 
spond to the after-sets such as those derived in Example 11. This correspondence 
is formalized in Proposition 21. 


Proposition 21. Lets € ATA and o € L*. Then 

(det(s) after o) = det(s after o). 
Proposition 22. Let s € ATA. Then Ftraces(s) = Ftraces(det(s)). 
Corollary 23. Let s € ATA. Then s =; det(s). 


A known result [6] is that alternating automata are exponentially more suc- 
cinct than non-deterministic automata, and double exponentially more succinct 
than deterministic automata. Although alternating automata are not a special 
case of AIAs (as AIAs lack the accepting and non-accepting states of alternating 
automata), we expect AIAs to be exponentially more succinct than IAs, as well. 


3.4 Connections between IAs and AIAs 


IAs and AIAs are used to represent sets of input-failure traces, and are in that 
sense interchangeable. First, we show that any IA can be translated to an AIA. 


Definition 24. For s € TA, the AIA induced by s is defined as AIA(s) = 
(Qs, Is,Os,T,V Q9) € ATA, where for allq E€ Q, andl E€ L: 
T(q,8) = T WCET mias 
V T.(q,2) otherwise 
Proposition 25. Lets € ZA. Then Ftraces(AIA(s)) = fcl(Ftraces(s)). 
Corollary 26. Let 51,82 ETA. Then sı <ig 82 <> ATA(s1) <i ATA(s2) 
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Definition 29 formalizes how disjunction in an AIA corresponds to non- 
determinism in IA. Specifically, if no transitions are present for some output 
in an IA, then the transition function of the corresponding AIA gives V Ø = L 
for this output, analogous to the explicit case T for inputs. Note that the graph- 
ical representation of an IA and that of its induced AIA are the same. 

The translation from AlAs to IAs is more involved. For disjunctions of states 
(q after £) = qı V qo, the translation of Definition 24 can simply be inverted, but 
this is not possible for conjunctions. As such, we represent any configuration by 
its unique disjunctive normal form. 


Definition 27. Let e € D(Q). Then DNF(e) is the smallest set in P(P(Q)) 
such that e = V{A Q | Q’ € DNF(e)}. 


The set DNF(e) can be constructed by using the axioms from Definition 6. 


Example 28. To find DNF(q' V (q? A (q! V q?))), we first rewrite the expression 
by using distributivity, associativity, commutativity and absorbtion, as follows: 


q V (A la V È) = a V PA) VPARP) HCV Aa) 
So we find DNF (q! V (q? A (q! V q?))) = {{¢'}, {@7, @}}. Two other examples 
are DNF(L) = DNF (V 0) = Ø and DNF(T) = DNF(V{A 0} = {0}. 
Definition 29. Let s € ATA. Then the induced IA of s is defined as 
IA (s) =(P(Qs),1,0,T, DNF(e2)) € ZA, with for Q C Qs and l€ L: 
DNF T, 0} lel 
TES e ota) a ye EO 


A state of IA(s) acts as the conjunction of the corresponding states in s. In 
particular, a singleton state {q} in IA(s) acts as the contained state q in s, and 
state Í in IA(s) acts as a chaotic state, having Ftracesy, s)(0) = FT 1,0. 


Proposition 30. Let s € ATA. Then Ftraces(s) = fcl(Ftraces(IA(s))). 


Corollary 31. Let 51,52 E€ ATA. Then sı <i 82 <= > IA(s1) <i IA(s2) 


4 Testing Input-Failure Refinement 


So far, we have introduced refinement as a way of specifying correctness of one 
model with respect to another. Often, a specification is indeed a model, but we 
use it to ensure correctness of a real-world software implementation. To this end, 
we assume that this implementation behaves like a IA. We cannot see the actual 
states and transitions of this IA, but we can provide inputs to it and observe its 
outputs. We assume that this IA must have an initial state, i.e. it is non-empty. 


Definition 32. /1] An IA i is empty if Q}? =9. 
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In this section, we introduce a basis for model-based testing with AIAs, anal- 
ogously to ioco test case generation |15]. Given a specification AIA, we derive 
a testing experiment on non-empty implementation IAs, in order to observe 
whether input-failure refinement holds with respect to the specification. This 
requires an extension of input-failure refinement to these domains. 


Definition 33. Leti €TA and s € ATA. Then 


i<if s <=> Ftraces(i) C Ftraces(s). 


4.1 Testers for AIA Specifications 


From a given specification AIA, we derive a tester. We model this tester as an 
IA as well, which can communicate with an implementation IA through a form 
of parallel composition. The tester eventually concludes a verdict, indicating 
whether the observed behaviour is allowed. To communicate, the inputs of the 
implementation must be outputs for the tester, and vice versa (note that J and O 
denote the inputs and outputs for the implementation, respectively). The tester 
should not block or ignore outputs from the implementation, meaning that the 
tester should be input-enabled. If the tester intends to supply an input to the 
implementation, it should also be prepared for a refusal of that input. A verdict 
is given by means of special states pass or fail. Lastly, to give consistent verdicts, 
a tester should be deterministic. This leads to the following definition of testers. 


Definition 34. A tester for (an IA or AIA with) inputs J and outputs O is a 
deterministic, input-enabled IA t = (Qt, O, IUT, T, q?) with pass, fail € Q:, 
such that pass and fail are sink-states with out(pass) = out(fail) = 0, and 
a € out(q) = T€ out(q) for allq E€ Qi andae I. 


Testing is performed by a special form of parallel composition of a tester and 
an implementation. If the tester chooses to perform an input while the imple- 
mentation also chooses to produce an output, this results in a race condition. In 
such a case, both the input or the output can occur during test execution. We 
assume a synchronous setting, in which the implementation and specification 
agree on the order in which observed actions are performed (in contrast to e.g. a 
queue-based setting [13], in which all possible orders are accounted for). These 
assumptions are in line with the assumptions in e.g. ioco-theory [15], and lead 
to the following definition of test execution. 


Definition 35. Let i E€ ZA be non-empty, and let t be a tester for i. We write 
a || qi for (at, qi) E Qi X Qi. Then test execution of i against t, denoted tlli, is 
defined as (Qi x Qi, 0, TUTUO, T, q? || a?) € ZA, with 
t £ 
Tia lla b) ={4 li | aaa, di > a} forleL 
Ta Ila, T) = {4 lla | a at, a Ar} forael 


We say that i fails t if q? || q? > fail]|q for some o and qi, and i passes t 
otherwise. 
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We reuse the notions of soundness and exhaustiveness from [15], to express 
whether a tester properly tests for a given specification. 


Definition 36. Let s € ATA and let t be a tester for s. Then t is sound for s 
if for alli ETA with inputs I and outputs O, i fails t implies i Lig s. Moreover, 
t is exhaustive for s if for alli € ZA, i passes t implies i <iz s. 


A simple attempt to translate specification AIA s to a sound and exhaustive 
tester would be similar to the determinization of s, but replacing every occurence 
of L and T by fail and pass, respectively. 


fail ife=L 
file) =< pass ife=T 
e otherwise 


Taking special care of input failures, the function fs then induces a tester 
(D(Qs) U {pass, fail}, O, IT U T,T, f;(e®)), with 


T (e, £L) = {file after, £)} for e € D(Q:), LE L 
T(v, £) = t : for v € {pass, fail} 


{pass} if (e after, a) = T 


T(e,@) = fi ED(Q:) acl 
(ae T otherwise E (Qs), 


This tester is sound and complete for s: each possible input-failure trace is in 
Ftraces(s) if and only if it does not lead to fail, by construction. Here, we 
make use of the fact that Ftraces(L) = Ø, meaning that L cannot be imple- 
mented correctly by a non-empty IA and can thus be replaced by fail. Likewise, 
Ftraces(T) = FT 7,9 means that T is always implemented correctly, and can be 
replaced by pass. 

However, this tester is quite inefficient. If a tester reaches pass after both oa 
and oa, then this input a does not need to be tested after o. Specifically, this is 
the case if and only if trace ca leads to specification configuration T. We thus 
improve the tester for a given specifications as follows. 


Definition 37. Let s € ATA. Then tester(s) € ZA is defined as 
tester(s) = (D(Q;) U {pass, fail}, O, T UT,T, fi(e2)), with fe as before, and 


T(e,0) = {file after, £4} if€€ O, or £ET and (e after, 0) AT itel 
> 10 if l€ I and (e after, 2) = T 
=i ) if (e after, a) = T 
A a ee POET (Gea 
T(v, 2) = te a : A for v € {pass, fail}, 2 € L 
i 


Combining Partial Specifications using Alternating Interface Automata A477 


Example 38. The tester for sg in Figure 3 is shown in Figure 5. 


tester(sB) 


20 
fon 


Fig. 5. The tester for the vending machine. The label ?O denotes a transition for every 
label in O. Remark that inputs for sg are outputs for tester(sg), and vice versa. 


Theorem 39 shows that soundness and exhaustiveness of a tester corresponds 
to refinement of the corresponding AIA. 


Theorem 39. Let 51,82 E€ ATA. Then 
1 tester(s;) is sound and exhaustive for IA(s1) 
2 tester(s,) is sound for s2 4= 82 <i 81 


3 tester(s;) is exhaustive for s2 <> sı <i 82 


4.2 Test Cases for AIA Specifications 


In [15], an algorithm was introduced to generate test cases. These are testers 
as in Definition 34 with additional restrictions, so that they can be used as 
unambiguous instructions to test a system. In particular, states of a test case 
should have at most one outgoing input transition. This ensures that no choice 
between different inputs has to be resolved during test execution. Additionaly, 
all paths of a test case lead to pass or fail in a finite number of steps, to ensure 
that test execution terminates with a verdict. 


Definition 40. A tester t for I and O is a test case if 
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— for all qs € Qi, | out(q)| < 1, and 


— there are no infinite sequences q?,qł,... for q?,dt,.-- € Qi \ {pass, fail} 
0 1 
such that q? Ey Uy ae 


The test case generation algorithm of [15] is non-deterministic, since it must 
choose at most one inputs in every state, and it must choose when to stop testing. 
We avoid defining a separate test case generation algorithm, and instead use 
Theorem 39 to obtain sound test cases. If specification sı is weakened to s2, such 
that tester(s2) is a test case, then soundness of tester(s2) for sı is guaranteed 
by the theorem. Such a weakened singular specification sə describes a finite, 
tree-shaped part of the original specification s1. 


Definition 41. Let 51,582 E ATA. Then s is a singular specification for sı if 
Qə is a finite subset of L*, with ef € {e,T,L}, ef = T ef = T and 
eg = 1 => e = 1, and having that for every o € Q2, the following holds: 


1. To(o,0)= L = > (sı after of) = L for LE L, 
2. (sı after of) = T = > Th(o,0) =T forleL 
3. Ta(o, l) is either L or T or ol for l€ L, and 
4. there is at most one a E€ I with T(o,a) £ T. 


It can be created from sı similarly to test case generation in [15]. In every 
state o of the tree s1, we either decide to pick one input specified in sı and also 
specify that in s2; or we do not specify any input, but only outputs; or we leave 
any successive behaviour unspecified (T). 

Test cases based on singular specifications are inherently sound, and for any 
incorrect implementation, it is possible to find a singular specification which 
induces a test case to detects this incorrectness. 


Theorem 42. If s2 is a singular specification for sı, then tester(s2) is a sound 
test case for sı. 


Theorem 43. Leti € ZA and sı € ATA. Ifi Lig sı, then there is a singular 
specification sz for sı such that i fails tester(s2). 


Example 44. Specification sg in Figure 3 can be weakened to singular specifi- 
cation sc shown in Figure 6. Indeed, sg <j sc holds, which can be established 
by comparing sc with det(spg) in Figure 4. Therefore tester(sc) is a sound test 
case for sp. 


5 Conclusion and Future Work 


Alternating interface automata serve as a natural and direct representation for 
sets of input-failure traces, and therefore also for refinement of systems with 
inputs, outputs, non-determinism and conjunction. We have used the observa- 
tional nature of input-failure traces to define testers, describing an experiment 
to observationally establish refinement of a black-box system. 
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tester(sc) 


?takey 
?by 
7 
+m ey 
BA (Pass) \?O 


Fig. 6. A weakened version sc of the vending machine, and the test case tester(sc). 
Question and exclamation marks are interchanged in tester(sc) to indicate that the 
input and output alphabets have been interchanged with respect to sc. 


The disjunction and conjunction of alternation brings interface automata 
specifications closer to the realm of logic and lattice theory. On the theoretical 
side, a possible direction is to extend configurations from distributive lattices to 
a full logic. On the practical side, classical testing techniques acting on logical 
expressions, such as combinatorial testing, could be translated to our black-box 
configurations of states. 

Possible criticism on our running example of a vending machine spg in Figure 3 
may be that its representation as an AIA is not concise, since the determinization 
det(sg) is much smaller and more understandable than spg itself. This is because 
the individual specifications offer a choice between outputs, such as tea with 
or without milk, whereas the intersection of all choices is singleton. A more 
natural encoding for this example is to express the types of drink with data data 
parameters, and the restrictions on them by logical constraints. This requires 
an automaton model in style of symbolic transition systems [9], which could be 
enriched with the concepts of alternation of AIAs. 

Interface automata typically contain internal transitions, and the interaction 
between internal behaviour and alternation is not immediately clear. A possible 
approach to extend AIAs with internal behaviour is to lift the e-closure of [1], 
the set of states reachable via internal transitions, to the level of configurations. 
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Abstract. Interaction languages such as MSC are often associated with 
formal semantics by means of translations into distinct behavioral for- 
malisms such as automatas or Petri nets. In contrast to translational 
approaches we propose an operational approach. Its principle is to iden- 
tify which elementary communication actions can be immediately exe- 
cuted, and then to compute, for every such action, a new interaction 
representing the possible continuations to its execution. We also define 
an algorithm for checking the validity of execution traces (i.e. whether or 
not they belong to an interaction’s semantics). Algorithms for semantic 


computation and trace validity are analyzed by means of experiments. 


Keywords: Interaction Language - Scenario - 
mantics - Causal Order - Trace Analysis 


1 Introduction 


Interaction Languages (IL) are powerful mechanisms to 
express behavioral requirements in the form of scenarios 
called interactions. ILs include several recognized stan- 
dards such as MSC and LSC [6], HMSC [25], MSD 
[13], UML-Sequence Diagrams [21] (UML-SD), etc. These 
graphical languages represent parts involved in a commu- 
nication scheme as vertical lines, called lifelines. Each one 
highlights a succession of instants where actions (emissions 
or receptions of messages) may occur. These instants are 
conventionally ordered from top to bottom as illustrated 
(in the style of UML-SD) in Fig.1-a, where the emission 
of mı occurs before that of mz. However, this sequencing 
does not order actions occurring on different lifelines; in 
Fig.1-b, even though the reception of m occurs graphically 
below the emission of m, no order is enforced. As such, this 
specificity is called weak sequencing’. In order to enforce 
a causality relation between such uncorrelated actions, we 
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(a) Default sequencing 


i = seq(a!ml1, a!m2) 


Ca] Cb] 
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(b) Uncorrelated instants 


i = seq(alm, b? m) 
La] Lb | 
m 


(c) Message passing 


i = strict(a!m, b?m) 


Fig. 1: UML-SD style 
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use a different ’strict sequencing’ operator. In Fig.1-c, it is used to express a 
message m passing between lifelines a and b. Here, m cannot be received before 
being emitted; the origin of the arrow denoting an instant preceding the one 
depicted by its target. Additional operators (e.g. UML-SD combined fragments) 
enable the expression of various concepts to order actions such as parallelisation, 
repetition, alternatives (illustrated in Fig.2), etc. They structure interactions and 
specify relative scheduling for subscenarii. 


When ILs are fitted with formal se- 


tal Œ] mantics, requirements can be processed 
sir seq using formal techniques, such as model- 
i ont in checking |1] or model-based testing [19]. 
=) / n” As pointed out earlier, the key seman- 

| amı b?m tic concept here is the causality rela- 

| m3 tion between actions that the interac- 
tion’s structure induce. Valid traces are 

whole interaction i = i), those respecting the subsequent partial 
subinteraction i); in blue order [27,19]. The authors of [17] define 

a simple IL as a set of terms built above 

Fig. 2: Syntax and Positions basic actions and provide it with a deno- 


tational semantics which associates each 
interaction term with a set of traces. This kind of formal framework can serve 
as a reference for stating theorems about interactions (e.g. the ’satisfaction con- 
dition’ proven in [17]). 

In this paper, we consider an IL which includes several distinct loop operators 
and provide it with a denotational semantics, directly comparable to that given 
by [17]. The semantics of an interaction with loops is defined by considering any 
finite number of loop unfolding combinations. Then, we introduce a second se- 
mantics, which can be qualified as operational, as we aim at presenting it in the 
style advocated in [24]. Here, accepted traces of an interaction i are defined by 
identifying its initial actions act, and for each of those the subsequent interaction 
i’ that will express the remainder of the trace. This operational semantics can 
therefore be thought of as a set of rules of the form i £% i’. Doing so is how- 
ever challenging as we need to keep track of possible conflicts between actions 
occurring on the same lifeline. While the operational semantics is particularly 
suitable to be adapted into concrete trace analysis algorithms, the denotational 
semantics serves as a mathematical foundation, revealing interesting algebraic 
properties. Both semantics have been implemented for semantic computation 
and conducted experiments indicate identical results. A trace analysis tool has 
also been adapted from the operational semantics and experimented on for cor- 
rectness and performances. 


The paper is organized as follows: Sec.2 introduces the IL and the denota- 
tional semantics. Sec.3 and Sec.5 resp. introduce the operational semantics and 
the subsequent trace analysis algorithm while Sec.4 reports experimental results 
about the consistency of both semantics w.r.t. one another. Finally, Sec.6 and 
Sec.7 resp. discuss related works and provide concluding remarks. 
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2 Interaction language and denotational semantics 


2.1 Base syntax 


This section provides a textual denotation of our basic IL (i-e. without loops). 
Interactions are defined up to a given signature (L, M) where L and M resp. 
are sets of lifelines and messages. Their base building blocks are a set of com- 
munication actions (actions) over L and M: Act(L,M) = {lAm|l € L,A € 
{!,?},m E€ M} where l!m (resp. /?m) designates the emission (resp. reception) 
of the message m from (resp. on) the lifeline l. For any action act in Act(L, M) 
of the form 1Am, O(act) denotes the lifeline J. Actions can be composed using 
different binary operators that introduce an order of execution between them 
(weak or strict sequentiality, parallelism, mutual exclusivity). 


Definition 1 (Basic Interactions). The set B(L,M) of basic interactions 
over L and M is inductively defined as follows: 


— @ € B(L,M) and Act(L, M) C B(L, M), 
— V(i1,i2) € B(L, M)? and Vf € {strict, seq, alt, par}, f(i1,i2) € B(L, M). 


The empty interaction Ø and actions of Act(L, M) are elementary interac- 
tions. The strict and seq operators are sequential operators: in strict(i1, i2), 
all the actions in 7; must take place before any action in ig while in seq(i1, i2) 
sequentiality is only enforced between actions that share the same lifeline. In 
Fig.1-b, b?m may precede? a!m (because a # b) while in Fig.1-c b?m cannot 
precedes a!m. Hence we use strict to encode the emission and reception of the 
same message object e.g. strict(a!m, b?m) on Fig.1-c*. In alt(i1, i2), the behav- 
iors specified by i; and iz are both acceptable albeit mutually exclusive®. In 
Fig.2 if alm, happens then b?mz cannot happen and vice-versa. In par (?1, i2), 
the executions of ių and i> are interleaved. For instance, in par(a!mı, almo), 
actions a!m, and a!mz can happen in any order. 

Interactions being defined as usual terms, we use positions expressed in 
Dewey decimal notation to refer to subinteractions [7]. A position p of i is a 
sequence of positive integers denoting a path leading from the root node of i to 
the subterm of i at position p. Interactions are defined with operations whose 
arity is at most 2. Hence, positions are words of {1,2}* i.e. words built over the 
empty word e, the words 1 and 2 and the concatenation law ".". In the following, 
we will use simplified notations without dots, e.g. "11" for the position "1.1". 

In Def.2, the functions ST and pos resp. associate to any interaction the set 
of all its subinteractions and the set of its positions. Moreover, we use the usual 
notation ijp |7] to designate unambiguously the subinteraction of 7 at position p 
for p € pos(i) (cf. example in Fig.2). 


3 Note that we omit depicting seq on diagrams as is classically done in UML-SD. 
4 drawn by convention as a plain arrow between a and b 
5 note that we handle the UML-SD opt operator as opt(i) = alt(i, Ø) = alt(@, i) 
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Definition 2 (Positions and subinteractions of a basic interaction). We 
define ST : B(L, M) > P(B(L,M)), pos : B(L, M) > P({1,2}*) and? _| 
B(L, M) x {1,2}* > B(L, M) such that Vi € B(L, M): 7 


— ifi=2 or i € Act(L, M) then ST (i) = {i}, pos(i) = {e} and ie =i 
— ifi = f(i1,i2) with f € {strict, seq, par, alt} then: 
e ST(i) = {i} UST (i1) U ST (ig) 
e pos(i) = {e} U 1.pos(i1) U 2.pos(i2) 
e i =i and for p = 1.p' (resp. 2.p') in pos(i), ip = iiļp (resp. t2jp). 


2.2 Denotational semantics for basic interactions 


As explained in Sec.2.1, operators occurring in an interaction induce relations 
of precedence between the actions of the interaction. In the example of Fig.2, 
if the left branch of the alt is chosen (i.e. alm; at position 11) then the action 
a!m3 at position 2 must occur after it. However if the other branch were chosen 
(i.e. b?mg at position 12), there would be no precedence order between actions 
b?mz and a!m3 as their common ancestor is a seq operator which only orders 
actions sharing the same lifeline. As a result, several orderings can be defined, 
depending, among others, on the choice of alt branches. These possible orderings 
can be encoded as a set ord(i) (defined in Def.4) which contains elements of 
the form (e,o) where e is the set of positions of the involved actions and o 
reflects the precedence relations between those. In the example of Fig.2, we have 
ord(i) = {({11, 2}, {(11, 2)}), ({12, 2},0)}. Indeed, as explained earlier, if the 
11 branch is chosen then the only two actions to be considered are a!m and 
alm3 on resp. positions 11 and 2 (therefore e = {11,2}) and they are ordered 
because of both the seq operator and their common lifeline, so that the associated 
precedence relation is modelled by o = {(11,2)} meaning that a!m, at position 
11 should occur before a!m3 at position 2. The only other possible ordering 
occurs when branch 12 is chosen and likewise we would have e = {12,2} with 
o = @ because the seq does not constrain the order of actions b?mz2 and alms 
with different lifelines. 


Definition 3 (Ordering type). Given i in B(L, M). The set O(i) of candidate 
orderings of i contains all couples (e,0) such that (1) e C pos(i), (2) for any p 
in e, ip E Act(L, M) and (3) o Ce xe. O is then the set Ucar) OC). 


In Def.4, for a given interaction i, ord(i) precisely defines which order- 
ings are to be considered among the candidate orderings O(i). For an order- 
ing (e,o) in O and p € {1,2}, we use the notation p.e = {p.p'|p' € e}, p.o = 
{(p-p1, p-P2)|(p1, p2) E o} and p.(e,o) = (p.e,p.o). The notation is canonically 
extended to any set O of orderings, by p.O = {p.(e, 0)|(e, 0) € O}. 

For the interaction @, there is no associated action and therefore we have a 
single (e,0) = (0,0). For a € Act(L, M), there is a single action a (at position 
€) and as a result, ord(a) contains a single (e,o) = ({e},0). For i = alt(i1, i2), 


p _|_ is a partial function so that 7), is only defined for positions occurring in pos(i). 
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either i; or ig is executed. Thus any ordering in ord(i) is simply an ordering 
from ord(%,) or from ord(iz) but correctly prefixed. Concretely, for any orderings 
(e1,01) E€ ord(i1) and (e€2,02) E€ ord(iz), ord(i) contains both 1.(e1,01) and 
2.(€2, 02). For i = par(i1,i2), both 7; and i2 have to be executed but no order is 
enforced between actions of either child branch. Thus, for any ordering (e1, 01) € 
ord(i,) and (e2,02) E€ ord(iz), ord(i) contains (1.e1 U 2.e2, 1.01 U 2.02). For i = 
strict(i1, i2) both 7; and iz have to be executed and all actions from 7; must occur 
before actions from iz. Thus for any orderings (e), 01) E€ ord(i,) and (e€2,02) € 
ord(iz), ord(i) contains an ordering (e,o) that concerns all actions from both 
children i.e. e = 1.e; U 2.e2 and such that o keeps track of all initial precedence 
relations while incorporating those induced by the strict operator i.e. o = 1.01 U 
2.02 U {(p1, p2)|p1 € 1.e1, p2 € 2.e2}. For i = seq(i1, i2) the same reasoning can 
be applied, with the exception that additional precedence relations only concern 
actions that share the same lifelines. Using the same notations, e = 1.e1 U 2.e2 
and o = 1.0; U 2.02 U { (p1, p2)|pı € 1.€1, p2 € 2.€2, O (ilp, ) = O (ilp) }- 


Definition 4 (Orderings of a basic interaction). We define the function 
ord : B(L, M) > P(O) as follows: 


ord(ð)=0 and Vact€ Act(L,M), ord(act) = {({e},0)} 


For any i, and ig in B(L, M): 
ord(alt(i1,i2)) = l.ord(i1) U 2.ord(i2) 


ord(par (i1, i2)) = U {(1.e1 U 2.e2, 1.01 U 2.02) } 


(e1,01)Eord(i1) 
(e2,02)Eord(i2) 


are AIN e = (l.e, U 2.e2) , o = 1.01 U 2.02 Uo’ 
ord(strict(i1, i2)) = l U a {eo E en et ee 
€1,01)€ord(t1 


(e2,02)€ord(i2) 


U e = (1l.e1 U2.e2) , 0 = 1.01 U 2.02 Ud! 

ord(seq(i1, i2)) = (e, 0) y Pri € l.e » P2 €E < 
g= ; i : 

(e1,01)€ord(i,) (pr p2) O(ip) = Olin) 


(e2,02)€ord(i2) 


A given ordering (e,o) with e = {e1,...,en} characterizes a set of behaviors 
that expresses every action whose position belongs to e exactly once. Such a 
behavior is thus given under the form of an execution trace tje, ih eyes where a 
is a permutation of [1, n]. Obviously, not all of those permutations are acceptable 
as they must not contradict the partial order specified by o. If we note pj = ea(j) 
for j in [1,n], we have Vj,k € [1, n]? j > k = (pj, pk) € o. 

The semantics o(i) of an interaction 7 then comes naturally as the union 
of all sets sem(i,e,o) of execution traces of i compatible with (e,o) € ord(i). 
When considering the example from Fig.2, we have sem(i, {11, 2}, {(11, 2)}) = 
{alm .a!m3} and sem(t, {12, 2},0) = {b?m2.alm3, alm3.b?m2}. 
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Definition 5 (Denotational semantics for basic interactions). For i € 
B(L, M) and (e,0) € ord(i) with n € N being the cardinal of e, we note: 


sem/(i,e,0) = {ilipi «dlp, [V(Pj, Pk) € , j > k => pj # Pr A (Dj, Pr) g o} 


a: B(L, M) > P(Act(L, M)*) is s. t. Vi € B(L, M), a(i) = U  sem(i,e,0) 
(e,0)€ord(t) 


2.3 Extension of the language with loops 


A loop is a repetition operator. Its content can be instantiated any finite number 
of times i.e multiple copies of it are inserted into the interaction. For UML-SD, 
the norm [23] states that "the loop construct represents a recursive application 
of the seq operator where the loop operand is sequenced after the result of earlier 
iterations". The UML-SD loop is hence associated with the seq operator. When 
instantiated, the loop content is ordered using seq this means for example that 
loop(alm) becomes seq(a!m,loop(a!m)) then seq(a!m, seq(a!m, loop(a!m))) and 
so on. In line with this explanation, let’s consider the 4 types of loops that 
can be characterized according to the operator ordering the instantiated content 
(seq, strict, par or alt). We can discard alt as instantiating loop(i) would lead 
to alt(i, jdoni )) meaning that the content can be read at most once and is 
therefore equivalent to opt(i) (i.e. alt(i,@)). We will here consider 3 operators 
denoted loopseq (the classical loop), loopsirice and looppar- 


a b 
m2 
strict 


[loop _par 
a b m 
loop_strict loop par 
mi m2 
loop_strict 
m m2 
(a-i) ia (a-ii) ia after alm (b-i) ia (b-ii) ip after almı 


Fig. 3: Examples showcasing the pertinence of loopstrict and looppar 


In Fig.3-a-i, ig)11 = a!m is the only immediately executable action and its ex- 
ecution leads to the interaction 7/, = strict(b?m,1,) drawn on Fig.3-a-ii. Because 
of the strict operator, 7’ = a!m is not immediately executable (preceded by 
tn = 
iq. However, if there was a seq operator instead of the strict, 7’ 
immediately executable and t, an accepted trace. 

Similarly, in Fig.3-b-i, 74)1; = a!m is the only immediately executable action 
and its execution leads to i, = par(a!m2,i,) drawn on Fig.3-b-ii. Because of 
the par operator, tor = a!m, is immediately executable. As a result ty = 
a!my,.a!m1.a!m2.a!mg is an accepted trace for ip. However, if there was a seq 


aļ211 
= b?m). As a result ta = al!m.alm.b?m.b?m is not an accepted trace for 


a|211 would be 
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instead of the par, tor would not be immediately executable and tẹ not an 
accepted trace. 

Consequently, considering looppar and loopstrice in addition to the classic 
loopseq improves expressiveness. In rough terms, looppar always allows new in- 
stantiations as each instance is executed in parallel w.r.t each others and the 
loop itself. loopstrice on the contrary does not allow new instantiations as long as 
the previous instance has not been entirely executed. The behavior of loopseq is 
somewhat in the middle, instantiations being allowed depending on the current 
structure of actions preceding and within the loop. 

In the following, we’ll extend our IL to loops and adapt previous definitions 
(from B(L, M) to I(L, M)). As in Def.6, any time we do so, we will only define 
the missing cases concerning loop terms. 


Definition 6 (Interactions). The set I(L, M) of interactions over L and M 
is inductively defined as follows: 


- Ø € I(L, M) and Act(L,M) c I(L, M), 
— V(i1,i2) € I(L, M)? and Yf € {strict, seq, alt, par}, f (i1, i2) € I(L, M), 
— Vi € I(L, M) and Yf € {strict, seq, par}, loop s(t) € I(L, M). 


The functions ST : I(L, M) > P(I(L, M)), pos : I(L, M) > P({1, 2}*) 
and _, :I(L,M) x {1,2}* > I(L, M) are defined by extending to loop terms 
the corresponding functions of Def.2: 
For alli in I(L, M) of the form loops(i') with f € { strict, seq, par}: 


— ST (a) = {i} U STG") 
— pos(t) = {e} U1.pos(i’), rq m= 
_ te = a and for p= 1.p' in pos(i), Up = ilp . toop_seq 


In order to define the semantics of interactions, we 
use the notion of term replacement |7]: the notation t[s]p 
denotes the term ¢ where its subterm at position p is re- (a) i = loopseq(ij1) 
placed by the term s. For instance with i = seq(a!m,b?m), with ijı = strict(alm, b?m) 
we have i[c?m]z2 = seq(a!m,c?m). This notation is con- pm ~ 
venient to represent terms obtained by loop unfolding. m 
For example let us consider an interaction i € I(L, M) 
with a loopseq at a position p € pos(i), that is, such that 
ijp = loopseq(t\p.1)- The interaction is then obtained from i 
by unfolding once the loop at position p is i[seq(%\p.1, tp) |p- 
In Def.7, the set Y(i, n) of all n-unfoldings of an interac- (b) a” = seq(ty1,4) 
tion i (i.e. the set of all interactions resulting from n in- 
stantiations of any loop from i) is defined recursively. On 
Fig.4 loop unfolding is illustrated with Y(i,0) = {i} and Y(i,1) = {7’}. 


loop_seq 


Fig. 4: Unfolding 


Definition 7 (n-unfoldings). We define Y : I(L, M) x N > P(I(L,M)) such 
that Vi € I(L, M) Y(i,0) = {i} and Yn € Nt: 


Y(i,n) = U Tlf Gps) tip) |p. — 1) 


pEpos(i) s.t. 4j,=loop sz (ijp.1) 
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We define a function F : I(L, M) > B(L, M) that flattens interactions with 
loops i.e. that replaces all loop subterms with the empty interaction Ø. For 
instance, in Fig.4 we have F(i) = Ø and F(i’) = seq(j,9). As F(I(L,M)) C 
B(L, M), we can define an unfolding-based semantics” for i € I(L, M) by simply 
considering the union of semantics obtained from flattened unfoldings of i. 


Definition 8 (Denotational semantics for interactions). 
We define ou : I(L, M) > P(Act(L,M)*) such that for all i in I(L, M): 


= UR a 


nEN i/EY(i,n) 


3 Operational Semantics 


We aim to define algorithms that can determine whether or not a trace t is ac- 
cepted by an interaction 7. This amounts to ascertaining whether or not t € ou (i). 
Naturally, being able to do so without having to compute a,,(i) is preferable. In 
the following we’ll refer to this problem as ’trace analysis’. 
As per Sec.2.3, asserting t E€ o,,(%) equates 
j ce Ce] to finding a combination of loop unfoldings i* € 
Uo Y(i, k) such that t € o(F(i*)). Even if las 
sible, this would be time and space consuming®. 
As for non acceptation, it equates to proving 
that Vix € UP) T(i, k) we have t ¢ o(F(i*)). 
In this case, a termination in finite time would 
not even be guaranteed and would require defin- 
ing some stopping criterion on the unfolding. 
Consequently, we investigate another ap- 
hit jo jie proach, in which traces are analyzed action by 
action. Here, instead of systematically unfolding 
ii = p C loops, we do so on demand (when executing an 
. ri act that is found within a loop). This approach 
| is based on a different semantics (o) whose de- 
ams baz scription is the purpose of Sec.3. 
f Oo is presented in the style of operational se- 
mantics, i.e. consisting in: (1) identifying from 
the structure of i which act can be immediately 
executed (coined ’frontier actions’) and (2) de- 
riving for each such act a new interaction 2’ spec- 
ifying all the possible continuations of act within the set of execution traces 
specified by i (noted as i £5 i’). 
Intuitively, an action is in the frontier iff no structural operators (parent 
nodes) coerce it to be preceded by another action (sibling leaf). Accepted traces 


m} 


m3 


Fig. 5: Operational Semantics 


T coined oy, u standing for ’unfolding-based’ 
8 and would not be adaptable if one considers an extension to monitoring as new 
combinations i* may be needed every time a new action is observed 
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are then built recursively through the successive consumption of actions. Let’s 
consider a trace t = acty.(...).actn with Vk € [1,n] i,_1 “+ i, and such that 
ig = i (by extension we may note i in). 
e If the last interaction i,, can express the empty trace e (i.e. € € oy, (in)) - which 
can be statically analysed - then t is accepted by i i.e. t € a(t). 
e In any case, for all frontier actions actn+1ı of in, we have in aes, in+1, Meaning 
that t can be extended by act,,41 and is a prefix of given trace(s) accepted by i. 
To illustrate this, let’s consider the example from Fig.5. The initial interaction 
is i = seq(alt(a!m,, b?m2), a!m3). There are 3 frontier actions that may play the 
role of act: i1 = a!mı, t)12 = b?mg and ij2 = alm3. The interactions remaining 
after the execution of 7);, and i); (resp. referred to as i} and 74), which happen 
to be the same, are depicted below on the left, while the one remaining after the 
execution of i2 (noted 73) is depicted on the right. The cases leading to 7, and i3 
are self-evident. As for the one leading to 75, the execution of a!m3 is contingent 
to the choice of the branch 12 of the alt hence the elimination of branch 11 in 
the remaining interaction. Indeed, if branch 11 were to be chosen, the execution 
of a!m3 would not be possible as a!m, should have been executed before. This 
illustrates that a!ms is a frontier action up to the choice of the right branch of the 
alt operator. Let us remark that b?m2 may indeed happen after a!m3 as those 
two actions occur on different lifelines and the top seq operator structuring them 
does not constrain their order of execution. Finally, we conclude by defining the 
operational semantics as oo(i) = almı.0o(i1) U b?M2.00(i3) U alm3.a0(74). 


3.1 Frontier actions 


In this section we explain how to identify frontier actions. Our notion of frontier 
differs slightly from that of [4], where it refers to the set of positions p such 
that Vj € {1,2}*, p.j Z pos(i) (i.e. positions of leaf nodes). Indeed, our frontiers 
contain only leaves that are immediately executable actions. 

Any ordering as defined in Def.4 provides a partial order relation for the set 
of (positions of) actions of a basic interaction. A frontier action act on position 
p is then simply a minimal element given such a relation (e,0), i.e. s.t. Vp’ € e 
we have (p', p) ¢ o i.e. act does not have to be preceded by any other action. The 
frontier of an interaction 7 is then defined as the union of such p, considering all 
the orderings from ord(i). As Def.4 did not include loop operators, we extend it 
in the following definition, in which the empty ordering (Ø, Ø) corresponds to the 
case where the loop has not unfolded. According to this, the frontier of i from 
Fig.5 is then front(i) = {11, 12, 2}. 


Definition 9 (Ordering). We define ord : I(L, M) > P(O) as an extension 
to I(L, M) of its counterpart from Def.4. For all f in {strict, seq, par}: 


Vi € I(L, M), ord(loop(i)) = 1.ord(i) U {(0,0)} 
Definition 10 (Frontier). front: I(L, M) > P({1,2}*) is the function s.t.: 


Vi € I(L, M), front(i)= |] {peep €e, (p',p) go} 
(e,0) €ord(i) 
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3.2 Pruning 


The design of the rules i 2% i’ hinted at earlier is made operational thanks to 
2 mechanisms: pruning and execution. Given an action act € front(i), branches 
preventing its execution are detected and eliminated with pruning. However, this 
is not done on the whole interaction i but rather on specific neighboring (w.r.t. 
act) subinteractions. Execution orchestrates the calls to pruning, eliminates act 
and constructs the remaining interaction 7’. 


Taq Œ] La b 
seq 
Toop_seq a Se Loop_ fea 
mı Tr seq ng 
l an 
SDs loopseq alma m4 
loop_seq TES. | 
att DL h?m ra S 
m2 : 
>K bmi (c) effect of pruning 
e m3 [b ] 
(b) 
red - action to execute 729p ge 
mg green - neighbors to prune 
blue - pruning 


(a) i (d) after executing ij22 = a!m4 
Fig. 6: Example showcasing pruning 


We first define the pruning mechanism which consists in removing from an 
interaction all the actions which occur on a given lifeline. For instance, on Fig.6- 
b, let us consider the interactions i; = ij); = loopseq(strict(a!m,,b?m,)) and 
i2 = ij21 = loopseq(alt(almz,b?m3)) highlighted in green. We want to remove 
actions occurring on the lifeline a (so as to allow the execution of ij22 = a!m4). 
We find that i1j11 = a!m (resp. i2j11 = a!mz) needs to be removed from 7; (resp. 
i2). If we do not want to get an interaction which is inconsistent or outwardly 
contradicts the original semantics, we can only prune subinteractions at positions 
where branching choices are made i.e. in alt and loop nodes. Indeed, by definition, 
eliminating a subinteraction at one such node would lead to a semantics that is 
included in the original. 

In ig, eliminating 72),; is easily done given that its parent node is an alt 
and that its brother node does not need to be eliminated. Indeed, it suffices to 
operate the replacement 72[%2)1]1 i.e. replacing the alt node with its right child 
b?m3. 

In 71, eliminating 24,1; is more delicate: its parent node is a strict and as 
such, behaviors from its left and right children must both happen (there is no 
branching choice). Thus, if we want to eliminate iijıı we must also eliminate 
the whole i,),. The problem is hence forwarded upwards in the syntax. The 
parent 74). is a loop operator, which characterizes a branching choice. We can 
eliminate the problematic branch by choosing not to instantiate the loop i.e. via 
the replacement i1[Ø]e. 
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The pruning mechanism is given in Def.11 as the recursive prune function, 
which takes as arguments an interaction 7 and a lifeline l. prune eliminates from 
i branching choices hosting actions that occur on l. 

In a first descending phase, prune goes down the syntax of i through recursive 
calls (from root to leaves). When reaching a leaf, prune returns an interaction 
i’ and a boolean b. b = T signifies that the current branch needs to be elimi- 
nated (pruned) while 7’ is the interaction that will be used to reconstruct i in 
the ascending phase (only used if b = L). Leaves are either actions or empty 
interactions. For an action act, if O(act) = l, the current branch must be pruned 
so prune(act,l) = (@,T): the value of the returned interaction 7’ has no im- 
portance here because a parent will be pruned anyway. If O(act) 4 l we have 
prune(act,l) = (act, L) because there is nothing to prune here. Similarly, we 
have prune(@,1) = (Ø, L). 

In the second, ascending phase, the pruned interaction is reconstructed ac- 
cording to the values of i’ and b returned from child branches. If at any point 
b = T, this value is forwarded upwards until an expendable branching choice is 
reached. 

prune(i,l) is recursively called on the child nodes of i. Depending on the 
operator in 7, the return values of prune(i)1,1) = (71, 61) (and also prune(ij2, 1) = 
(i5, 62) for binary operators) will be used differently to determine i’ and b. 

For the operators f € {strict, seq, par}, if any one child must be pruned 
(bı V b2) then the whole branch must also be pruned and otherwise a recon- 
structed f(i4,74) is returned. For the exclusive alternative alt, if no branch needs 
pruning, alt(i},i%) is returned; if any single branch needs pruning, prune returns 
the one that does not need to be pruned and if both branches need pruning, then 
the whole interaction is pruned. For the repetition operators, if the loop con- 
tent needs pruning then the choice of never taking the loop’ is made meaning 
that Ø is returned with b = L, signifying a successful pruning. If there is no 
needed pruning, it simply returns the loop with an already pruned loop content 


loop 7 (i1). 


Definition 11 (Pruning). prune : 1(L,M)xL —> I(L, M) xbool is the function 
such that for alli € I(L, M) andl € L: 


— prune(@,l) = (Ø, L) 
— for act € Act(L, M): if O(ip) = l then prune(act,1) = (Ø, T) (else (act, L)) 
— ifi = f(i1,i2) with f E€ {strict, seq, par}, given prune(i1, l) = (i1, b1) and 


prune(iz, l) = (ih, be): 
T) (else (f (i1, i2), L)) 


if bı V bg then prune(i, l) = (Ø, T) 

— if i= alt(i1, i2), given prune(i1,l) = (i1,b1) and prune(iz, l) = (i3, b2): 
D = (i2, 1) 
l) "F 


e if by Aba then prune(i, l) 
e if by Aabo then prune(i, 
e if ab, Aba then prune(i, l) = (i1, L) 
e if aby Anba then prune(i,l) = (alt(i4, i2), L) 
— ifi =loops(i1) with f € {strict, seq, par}, given prune(i1, l) = (i1, b1): 
if bı then prune(i,l) = (Ø, L) (else (loops (i1), L)) 


Revisiting Semantics of Interactions for Trace Validity Analysis 493 


3.3 Execute function and operational semantics 


Let us consider the example 7 from Fig.6. We wish to execute the frontier action 
i22 = alm, (highlighted in red). To allow this execution we need at first to 
remove the actions occurring on the same lifeline (i.e. on a) from the neighbors 
highlighted in green. To do so, we use the prune function from Def.11. More 
generally, the nature of our syntax is such that, for the execution of a frontier 
action at position p, we only need to prune subinteractions at positions po.1 s.t. 
Ap’ € {1,2}* s.t. p = po.2.p' and s.t. ilipo = seq(ilpo.1; İlpo.2)- Those are exactly 
the left cousins of ip that are scheduled sequentially (i.e. with seq) w.r.t. ijp- 

We now define the execution function x (Def.12), which takes as arguments 
an interaction 7 and a frontier position p and returns the remaining interaction 
i’. As explained earlier, x orchestrates the use of prune. In the example from 
Fig.6 this first cleaning feature would result in the transformation of i from the 
diagram on Fig.6-a to the one on Fig.6-c. The only thing left to do is then to 
remove the executed action s.t. the result is the interaction from Fig.6-d. 

x is defined inductively on both the structure of the interaction 7 and the 
position p = dy...d, € {1,2}". The execution of y(i, p) traverses recursively the 
syntactic structure of i guided by the path defined by the position p, that is, 
from X(ije, d1...dn) (root node), ..., up to x(%)p, €) (target action leaf to execute). 
Here, Xx(ijp,€) = Ø constitutes the stopping criterion and 7’ is then constructed 
when the algorithm goes back up through the syntactic structure of i. Assigning 
Ø to X(ijp, €) ensures that the action ijp is removed in the construction of i’. 

When a par node is encountered during the upward traversal, i.e. for 7 € 
[L n], ijai...) = PAT(t}ay...dj.1> dy...d;.2) then X(ija,...d;,4j+1---dn) is simply: 

par (x (t}dy...dj.1,4j42---dn), td) ...dj.2) if dj+1 =1 or, 

par (idi ...d;.1> X(ildi...d;.2 dj42.--dn)) if dj+1 = 2. 

Indeed, as par specifies parallel executions, there is no need for pruning. 

When an alt node is reached, using the same notations, we would have: 

X(ildi...d;; dj+1--dn) = X(ijdi...dj41; dj+2--dn). 

Indeed, we can ’skip’ the alt node itself and replace it directly with the interaction 
resulting from the execution of the chosen branch. 

When a loop is reached, i.e. ijg,..4, = loopș(ija,...a;.1) (with a mandatory 
dj+1 = 1), we have : 

XG}, dys dj+1-dn) = FOG ay...dj415j42--dn), tdy...d;)- 

Indeed, the execution is done on a copy of the loop content that precedes (with 
f operator) the loop Ud,...a, itself, that is, on an unfolding of the loop. 

For the sequential operators, pruning needs to be considered only if the ex- 
ecuting action is situated on the right branch of the seq or strict node (if the 
action is on the left branch, we have the same transformation as in the par 
case). Given jjq,...d, = S€q(idy...d;.1) 4 Jd,...d;.2) and dj41 = 2, when construct- 
ing X(t4d,...d;,j41--dn) we must prune in %jq,..4,.1 all the actions that could 
interfere with ij, i.e. those taking place on O(i),). As such, given (71,61) = 
prune(tjd,...d;.1,O(ip)), we'll replace the left branch of the seq with i} and re- 
construct: 


X(ildi...d;s dj+1--dn) = seq(t, X(iady...dj414j42---dn)). 
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Given that the strict operator won’t allow any action from the left branch to 
occur after an action on the right has occurred, we can simply prune the whole 
left branch i.e. given ijg,_.4, = strict(tja,...d;.1)4d,...dj.2) and dj41 = 2: 

XO ily dj41...dn) = X(ildy..dj419 dj+2-dn). 

Definition 12 (Execution). The function x : I(L, M) x {1,2}* > I(L, M) is 
defined for couples (i,p) with i € I(L, M) and p € front(i) as follows: 
— if p = € then x(i, p) = Ø 
— if p = 1.pı then 
e ifi =; f (41, i2) with f & {strict, seq, par} then x (i, p) = f (x(41, pı), i2) 
e ifi = alt(i1, i2) then x(i,p) = x(i1, p1) 
e if i =loops(t1) with f € {strict, seq, par} then x(i, p) = f(x(i1, pı), i) 
— if p =2.p2 then 
e ifi = seq(i1,i2) then x(i, p) = seq(i1, x(i2, P2)) 
where prune(i1, O(ijp)) = (i 
s ifi = strict(i1, i2) then x(i,p x(i2, p2) 
e if i = par(i1, i2) then x(i, p) = par (i1, x(i2, p2)) 
e ifi = alt(i1,i2) then x(i, p) = x(i2, p2) 

In Def.13 below, we now define the operational semantics. Note that interac- 
tions that can express the empty trace e are identified with the predicate expe. 
This semantics expresses rules of the form i 43 x(i,p) where p € front(i). 


Definition 13 (Operational semantics for interactions). 
We define 0, : I(L, M) > P(Act(L, M)*) as: 


oo(i) = emptyli)U |] ajp-o0(x(é,p)) 
pEfront(i) 


with empty(i) = {e} (resp.0) if expe(i) = T (resp. L) 
where exp, : I(L, M) —> bool is defined as: 


4 Back-to-back comparison of both semantics 


Dataset. The recursive definition of interactions as syntactic terms allows to 
characterize them by their depth. Interactions of depth 1 include the empty 
interaction Ø and all actions from Act(L, M). Depending on the cardinals n; = 
Card(L) and nm = Card(M), those interactions can all be enumerated and 
computed. Given a signature, interactions of depth 2 can be deduced from those 
of depth 1 and exhaustively computed via the application of the binary and unary 
operators (e.g. seq(,a!m)). Likewise, interactions of depth 3 can be computed 
from those of depths 1 and 2 and so on. To illustrate this, Fig.7 presents for each 
couple (nj, Nnm) the numbers of interactions of depths 1, 2 and 3 in each cell. For 
instance, we have 3 interactions of depth 1 for ny = Nnm = 1. 
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Experiments. We implemented both semantics y~ n| 1 2 3 
(o, from Def.8 and o, from Def.13) and com- 3 5 7 
pared the set of traces o„(i) and oo(i) they gen- 1 45 | 115 | 217 
erate (with a stopping criterion on the maximum 9313 21 as 
number of loop unfolding - 4 in our experiments) 2 ie S s 
on a significant set of interactions of depth 3 with 57845 | 519129 |2121405 
nı = Nm = 3. For all of the 234175 selected inter- T 13 19 

3 217 715 1501 


actions 7 from our dataset, the tests systematically 
concluded on the equality ou (i) = co(i). Although 
not a proof, our successful back-to-back compari- Fig. 7: Numbers of 

son comforts our confidence in both semantics, all interactions per nı, nm and d 
the more so because of the exhaustivity of the subject data set up to maximum 
numbers of lifelines, messages types, interaction depth (up to 3), number of loop 
unfolding (up to 4), allowing covering all 2 by 2 combinations of operators. 


201159|2121405|9244659 


5 Trace analysis mem = 


ml 


The definition of the execution function x (Def.12) 
that comes with the operational nature of the o, se- 
mantics (Def.13) allows us to solve the ’trace anal- 
ysis’ problem hinted at earlier. Indeed, analysing a 
trace t = act,...act, w.r.t. an interaction ig equates 

to verifying whether or not there exists transformations | 
io aeti, x(io, pı) = ii, e.. in—1 EAN Xlin—1; Pn) =la s.t. Uri 
accepts the empty trace. | ma 

We define an w function (Def.14) which takes as 
arguments an interaction 7 and a trace t and checks 
whether or not t is a trace of i. Additional traceabil- 
ity information is provided using four distinct verdicts: 
e Covered is returned when t is a trace of i i.e. t € o9(t); 
e TooShort is returned when t ¢ co(i) is a strict prefix 
of a trace of i i.e. St’ € Act(L, M)* s.t. t.t! € oo(i); 

e TooLong is returned when neither Covered nor 
TooShort can be, and given t = acty...act, Jk < n 
s.t. act...actk € (i) i.e. t extends a trace of i; 

e Out is returned when none of the others can be. 

We define the enumerated type Verdict and provide 
it with a total order Out < TooLong < TooShort < 
Covered. 

e If t is empty then: either i accepts the empty trace in Fig.8: Application of w 
its semantics and in this case w(i,t) returns Covered, or it returns TooShort. 
e If t is of the form act.t’ (i.e. not empty and starts with act) then, for all match- 
ing actions 7), in the frontier of i, recursive calls are performed on w(x(i, p), t’) 
and w(i, t) returns the strongest (maz function) verdict among those and either 
TooLong if i expresses the empty trace € or Out if not. 


b 


Loop _ seq 
ng 
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Definition 14 (Trace Analysis). We define w : I(L, M) x Act(L,M)* > 
Verdict such that Vi,t € I(L, M) x Act(L,M)*: 

— w(t, €) = Covered (resp. TooShort) if exp.(i) = T (resp. L) 

— if t ts of the form act.t’ then: 


(i,t) = mars (outei) U {olxti) t) 


p E€ front(i) 
Up = act 


with oute(i) = {TooLong} (resp. {Out}) if exp.(i) = T (resp. L) 


Fig.8 is a graphical representation of the w process when applied to the 
interaction from Fig.6-a and the trace a!lm4.b? ms. 
Fig.9 presents a synthesis of experiments conducted 


B35 S = a to assess the correctness of w and of our implementation 
Se Ta =S F =) of it. We randomly sampled 1000 interactions from the 
FIE E 5 5 2L egl | set of 234175 interactions mentioned in Sec.4. Each of 
SE we g them were tested with the 18 single action traces from 
EJE 5 aS -l2 |-| Act(L,M) and we sampled 15 traces from their seman- 
| #& kl l tics (computed with 3 loop unfolds). Each of those traces 
AR| a S SNS c| were tested as well as a random selection of their prefixes 
s & S| and of interesting mutants. Addition (resp. replacement) 
= = 5> = =] mutants consists in adding an action to a trace (resp. 
8 lollo prefix). By construction we could classify all those traces 
5 B B according to the verdicts they are expected to obtain. 
Y S E = = Slole Fig.9 details those results, showing a systematic con- 
aE] 3 || lolols} cordance between the expected and obtained verdicts. 
g z z zig Those results reinforce our confidence on w, the more so 

(SO iS) 9 109 |O |O |O . . 
a ~ g5] that they were done on a panel of traces and interactions 

z 22Sa] which covers all 2 by 2 combinations of operators. 

5 Fs le To provide an evaluation of performances (plotting 
E ZABI time vs. length), we needed a large model and long cor- 
z LViAMIS] rect traces. Indeed, the time required by the analysis is 


not always correlated to trace length e.g. an arbitrar- 
ily long trace starting with an action act of position 
p € front(i) is analyzed immediately, whatever length 
it may be. There is however a correlation for correct traces and their prefixes. 
We defined a partial high-level model of the MQTT [22] telecommunication pro- 
tocol (see Fig.10-a). This model states that a communication session between a 
client and a broker starts (resp. ends) with a sequential connection (resp. dis- 
connection) phase. In between, at any time, any number of instances of one of 
the 5 proposed subinteractions can be run concurrently. Hence, we used a multi- 
threaded Python script to generate 100 traces, each of those corresponding to 
the concurrent activation and execution at random time intervals of 20 instances 
of the looppar from Fig.10-a. All those traces (resp. prefixes) have the verdict 
Covered (resp. TooShort); we evaluated computation times and plotted some 
of them on Fig.10-b. 


Fig. 9: Correctness of w 
experiments 
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The linear regression shows | © ,-———W——— seeams 
curves with a great variability j 
(some traces need 4 seconds while eem 
others only 0.06). In this precise |== 
model, it is explained by the pres- pubs agso 
ence of par (via looppar) opera- [sia owt 
tors and by the fact that messages = 
are not uniquely identified. For [uwis geez 
instance analyzing t = a!m.b?m pe 
on i = par (a!m, strict(alm, b?m)) “ie 
would give rise to 2 branches: 
i = strict(alm,b?m)_ (resp. sock 
i = par(alm,b?m)) with t = 
b?m which ends with Out (resp. 
Covered) because m is not 
uniquely identified. This number disconnect 
of branches can quickly explode ' P umet 
when par operators are stacked (a) mqtt model (b) time vs. trace length 
which happens when the trace de- 
scribes an execution where many 
loop content instances overlap. An applicable solution is to treat message 
data arguments, given that communication protocols provide unique ids e.g. 
m(id1) # m(id2). In Fig.10-b, on the plot below, we magnified on traces 9, 34 
& 61 which have a very short analysis time. We can surmise here that minimal 
(perhaps no) loop overlap occurred as the derivatives are almost constants (es- 
pecially for trace 61). In conclusion, performance highly depends on the model 
and input trace, but treating data which specifies unique ids for messages would 
generalize the best case scenario. In this case, the algorithm could be applied to 
monitoring within the limits of an input frequency that is inferior to the time 
required to analyze a trace of length 1. 


cle es UTES SU Hie Aat E D 0 TI 


Zoom On traces 9, 34, 61 
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z 


Fig. 10: Performances 


6 Related work 


For classical IL such as UML-SD or HMSC, many authors have proposed their 
own takes on formal semantics (see the survey [21] for UML-SD). 

Denotational Semantics. Most existing semantics based on term interpreta- 
tions are given in a denotational style [27,14,3,17] and do not follow-up with 
algorithmic tools. In [27], the authors propose a denotational semantics similar 
to ours (Def.5) as far as the strict, alt and par operators are concerned. [|14] 
proposes a semantics that is a detailed version of the one from [27]. In [17] there 
is a distinction (snd(s,r,m)|snd(s,m)|rcv(s,r,m)|rcv(r, m)) between basic ac- 
tions whether or not the intended receiver or original sender is the environment. 
Apart from that, and the absence of loops, the denotational semantics proposed 
by [17] is similar to ours. In [3], an institutional approach, likened to that of [17] 
is proposed. However it includes loops and deals with modalities associated to 
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the neg and assert operators [23] by separating the semantics in sets of accepted 
and refused traces. This issue of modality is also raised in [21] and [13] but it is 
out of the scope of this paper. 


Translations based approaches. Most other approaches rely on translations 
that map concepts of the given IL into a target formal framework, most often 
based on automata [11,2,28,19] or Petri nets [8,5,10]. Albeit those translations 
allow reusing advantageously the target framework’s tools, relying on them to 
capture semantics leads to reasoning on foreign concepts. In [11], UML-SDs 
are translated into timed automata, which are then verified with the UPPAAL 
tool [18]. The translation mechanisms only concern models with synchronous 
communications. An observer automaton has to be designed so as to intercept 
communications between automata, make them observable, and enter an error 
state if other events are observed. In [2], each lifeline is translated into a timed 
input output symbolic transition system (TIOSTS) and message passing relies 
on some synchronous product. In order to cope with asynchronism, FIFO based 
communication schema have been introduced to ensure the consistency of exe- 
cutions on different lifelines. Also, dedicated variables have to be introduced to 
keep track of branching choices specified by alt or loop operators. In [28], a sym- 
bolic automaton is built from UML-SD specifications in the goal of analyzing 
traces by means of valid, invalid or inconclusive verdicts. [19] focuses on how 
to test Message Sequence Charts when the system is only partially observed. 
A translation into a network of asynchronous concurrent automata allows to 
define semantics through a product automaton as in [2]. In [8], UML-SD speci- 
fications are translated into multivalued nets (M-nets). The translation is com- 
positional, entry and exit places of the M-nets corresponding to subinteractions 
being connected differently according to the parent combined fragment. However 
this process is complicated by the tracking of actions that are completely un- 
ordered w.r.t. one another. [8] also treats data in the form of variables, message 
parameters and guards. In [5], the authors propose an approach to automatically 
translate UML-SDs designed with the Papyrus tool [12] to Coloured Petri Nets 
(CPNs) in a format compatible with CPNTools [16]. CPNs come with an exe- 
cution semantics that is particularly adapted for the description and analysis of 
distributed and concurrent systems. In [5], the translation revolves around a list 
of 11 rules with different priorities and which are applied to translate different 
concepts (lifelines, message occurrences, combined fragments, etc.) while iterat- 
ing sequentially through the UML-SD’s elements. In [10] a set of UML-SDs are 
translated into Extended Petri Nets. Input execution traces can then be checked 
against the EPNs. 


Operational approach. The literature contains few attempts at defining op- 
erational semantics for ILs. In [26], the authors build formal expressions over a 
process algebra signature. Starting from axioms such as e | (the empty process 
e terminates) and a “> e (a being an atomic action), an expression describing 
a MSC is build using rules such as (a “> 2’) A (y &) > (z F y 5 2’). Such an 
expression is then associated with a transition graph. The contribution in [26] 
does not however deal with loop operator and it is quite different from ours as 
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the proposed transformations operate on process-algebraic expressions and not 
on syntactic terms. In contrast, the semantics proposed in [20] relies on syn- 
tactic term transformations. Still, it also requires a communication medium as 
it is defined as the output of a combination of two transitions systems: an ex- 
ecution system which keeps track of communications, and a projection system 
which selects the next action to execute and provide the resulting interaction. 
As explained in [9], communication models keep track of emitted messages and 
messages pending receptions. They can for instance take the form of a set of 
dedicated buffers (e.g. FIFO). Our approach has the advantage of making such 
communication models implicit. 

Discussions. Despite interaction languages specifying no synchronisation mech- 
anisms between lifelines, several approaches that aim to implement tools, impose 
synchronisation points when entering and exiting combined operators and at de- 
cision points (alt, opt, loop) [28,2,8,21] (although more recent works such as 
[10,20] do not). Although translation-based approaches have the benefit of al- 
lowing the use of the many existing analysis tools (UPPAAL [18], DIVERSITY 
[15], CPNTools [16] etc.) we postulate that direct operational approaches such 
as ours facilitate features such as animation and debugging, becoming for the 
most part free-of-charge by-products of the analysis process. 


7 Conclusion 


In this paper we proposed an operational semantics for ILs, aimed at trace valid- 
ity analysis. This semantic is built upon a formal syntax for interaction terms and 
validated back-to-back w.r.t. a reference denotational semantics. Our semantics 
is built on partial order relations induced on messages by the syntax. Those re- 
lations allow the identification of immediately executable actions. Pruning tech- 
niques then ensure a consistent semantics based on successive transformations of 
the form i £% i’. On this principle, we have defined and implemented algorithms 
to compute semantics and to analyze the validity of traces. Experiments were 
successfully conducted in order to evaluate the correctness of each. 

We intend to enrich our formalism: (1) by expanding trace analysis to a 
distributed context, where a set of traces (multi-trace) may be analyzed concur- 
rently on a subset of observed lifelines; (2) by investigating whether or not our 
algorithmic treatments are fast enough to deal with traces on-the-fly so as to 
adapt them to monitoring. (3) by extending our IL to include modality oper- 
ators such as assert or negate. (4) by allowing the use of message arguments, 
variables, clocks and constraints within models. 

Additionally, it would be interesting to perform a comparison with translation- 
based approaches. This may consist in a comparison of formal semantics and/or 
in benchmarking implementations according to a certain performance metric. 
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Abstract. This report describes the 2020 Competition on Software 
Testing (Test-Comp), the 2°¢ edition of a series of comparative evaluations 
of fully automatic software test-case generators for C programs. The 
competition provides a snapshot of the current state of the art in the area, 
and has a strong focus on replicability of its results. The competition 
was based on 3 230 test tasks for C programs. Each test task consisted 
of a program and a test specification (error coverage, branch coverage). 
Test-Comp 2020 had 10 participating test-generation systems. 
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1 Introduction 


Software testing is as old as software development itself, because the most straight- 
forward way to find out if the software works is to execute it. In the last few 
decades the tremendous breakthrough of fuzzers !, theorem provers [40], and 
satisfiability-modulo-theory (SMT) solvers [21] have led to the development of 
efficient tools for automatic test-case generation. For example, symbolic execution 
and the idea to use it for test-case generation [33] exists for more than 40 years, 
yet, efficient implementations (e.g., KLEE [16]) had to wait for the availability of 
mature constraint solvers. Also, with the advent of automatic software model 
checking, the opportunity to extract test cases from counterexamples arose (see 
Bast [9] and JPF [41]). In the following years, many techniques from the areas 
of model checking and program analysis were adapted for the purpose of test-case 
generation and several strong hybrid combinations have been developed [24]. 
There are several powerful software test generators available [24], but they 
were difficult to compare. For example, a recent study [11] first had to develop a 
framework that supports to run test-generation tools on the same program source 
code and to deliver test cases in a common format for validation. Furthermore, 
there was no widely distributed benchmark suite available and neither input pro- 
grams nor output test suites followed a standard format. In software verification, 
the competition SV-COMP [3] helped to overcome the problem: the competition 
community developed standards for defining nondeterministic functions and a 


1 http://lcamtuf.coredump.cx/afl/ 
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language to write specifications (so far for C and Java programs) and established 
a standard exchange format for the output (witnesses). A competition event with 
high visibility can foster the transfer of theoretical and conceptual advancements 
in the area of software testing into practical tools. 

The annual Competition on Software Testing (Test-Comp) [4,5]? is the 
showcase of the state of the art in the area, in particular, of the effectiveness 
and efficiency that is currently achieved by tool implementations of the most 
recent ideas, concepts, and algorithms for fully automatic test-case generation. 
Test-Comp uses the benchmarking framework BENCHExEc [12], which is already 
successfully used in other competitions, most prominently, all competitions that 
run on the StarExgc infrastructure [39]. Similar to SV-COMP, the test generators 
in Test-Comp are applied to programs in a fully automatic way. The results are 
collected via BENCHEXEc’s XML results format, and transformed into tables and 
plots in several formats.? All results are available in artifacts at Zenodo (Table 3). 


Competition Goals. In summary, the goals of Test-Comp are the following: 


e Establish standards for software test generation. This means, most promi- 
nently, to develop a standard for marking input values in programs, define 
an exchange format for test suites, and agree on a specification language for 
test-coverage criteria, and define how to validate the resulting test suites. 

e Establish a set of benchmarks for software testing in the community. This 
means to create and maintain a set of programs together with coverage 
criteria, and to make those publicly available for researchers to be used in 
performance comparisons when evaluating a new technique. 

e Provide an overview of available tools for test-case generation and a snapshot 
of the state-of-the-art in software testing to the community. This means to 
compare, independently from particular paper projects and specific techniques, 
different test-generation tools in terms of effectiveness and performance. 

e Increase the visibility and credits that tool developers receive. This means 
to provide a forum for presentation of tools and discussion of the latest 
technologies, and to give the students the opportunity to publish about the 
development work that they have done. 

e Educate PhD students and other participants on how to set up performance 
experiments, packaging tools in a way that supports replication, and how to 
perform robust and accurate research experiments. 

e Provide resources to development teams that do not have sufficient computing 
resources and give them the opportunity to obtain results from experiments 
on large benchmark sets. 


Related Competitions. In other areas, there are several established competi- 
tions. For example, there are three competitions in the area of software verification: 
(i) a competition on automatic verifiers under controlled resources (SV-COMP [3]), 
(ii) a competition on verifiers with arbitrary environments (RERS [27]), and 
(iii) a competition on interactive verification (VerifyThis [28]). An overview of 


2 https: //test-comp.sosy-lab.org 
3 https: //test-comp.sosy-lab.org/2020/results/ 
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16 competitions in the area of formal methods was presented at the TOOLympics 
events at the conference TACAS in 2019 [1]. In software testing, there are several 
competition-like events, for example, the DARPA Cyber Grand Challenge [38] 4, 
the IEEE International Contest on Software Testing®, the Software Testing 
World Cup °®, and the Israel Software Testing World Cup”. Those contests are 
organized as on-site events, where teams of people interact with certain testing 
platforms in order to achieve a certain coverage of the software under test. There 
are two competitions for automatic and off-site testing: RodeOday ® is a com- 
petition that is meant as a continuously running evaluation on bug-finding in 
binaries (currently Grep and SQLite). The unit-testing tool competition [32] ° is 
part of the SBST workshop and compares tools for unit-test generation on Java 
programs. There was no comparative evaluation of automatic test-generation 
tools for whole C programs in source-code, in a controlled environment, and 
Test-Comp was founded to close this gap [4]. The results of the first edition 
of Test-Comp were presented as part of the TOOLympics 2019 event [1] and 
in the Test-Comp 2019 competition report [5]. 


2 Definitions, Formats, and Rules 


Organizational aspects such as the classification (automatic, off-site, reproducible, 
jury, traning) and the competition schedule is given in the initial competi- 
tion definition [4]. In the following we repeat some important definitions that 
are necessary to understand the results. 


Test Task. A test task is a pair of an input program (program under test) and 
a test specification. A test run is a non-interactive execution of a test generator 
on a single test task, in order to generate a test suite according to the test 
specification. A test suite is a sequence of test cases, given as a directory of 
files according to the format for exchangeable test-suites. 1° 


Execution of a Test Generator. Figure 1 illustrates the process of executing 
one test generator on the benchmark suite. One test run for a test generator gets as 
input (i) a program from the benchmark suite and (ii) a test specification (find bug, 
or coverage criterion), and returns as output a test suite (i.e., a set of test cases). 
The test generator is contributed by a competition participant. The test runs are 
executed centrally by the competition organizer. The test validator takes as input 
the test suite from the test generator and validates it by executing the program 
on all test cases: for bug finding it checks if the bug is exposed and for coverage 
it reports the coverage. We use the tool TesTCov [14] +! as test-suite validator. 


4 https: //www.darpa.mil/program/cyber-grand-challenge/ 
5 http://paris.utdallas.edu/qrs18/contest.html 
6 http://www.softwaretestingworldcup.com/ 
T https: //www.inflectra.com/Company /Article/480.aspx 
8 https: //rodeOday.mit.edu/ 
9 https: //sbst19.github.io/tools/ 
10 https: //gitlab.com/sosy-lab/software/test-format / 
11 https: //gitlab.com/sosy-lab/software/test-suite- validator 
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Test 
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Fig. 1: Flow of the Test-Comp execution for one test generator 


Table 1: Coverage specifications used in Test-Comp 2020 (same as in 2019) 


Formula Interpretation 


COVER EDGES (@CALL(__VERIFIER_error)) The test suite contains at least one test 
that executes function __VERIFIER_error. 
COVER EDGES (@DECISIONEDGE) The test suite contains tests such that 
all branches of the program are executed. 


Test Specification. The specification for testing a program is given to the 
test generator as input file (either properties/coverage-error-call.prp or 
properties/coverage-branches.prp for Test-Comp 2020). 

The definition init(main()) is used to define the initial states of 
the program under test by a call of function main (with no parame- 
ters). The definition FQL(f) specifies that coverage definition f should 
be achieved. The FQL (FSHELL query language [26]) coverage definition 
COVER EDGES(@DECISIONEDGE) means that all branches should be covered, 
COVER EDGES (@BASICBLOCKENTRY) means that all statements should be cov- 
ered, and COVER EDGES (@CALL(__VERIFIER_error)) means that calls to func- 
tion __VERIFIER_error should be covered. A complete specification looks like: 
COVER( init(main()), FQL(COVER EDGES(@DECISIONEDGE)) ). 

Table 1 lists the two FQL formulas that are used in test specifications of 
Test-Comp 2020; there was no change from 2019. The first describes a formula 
that is typically used for bug finding: the test generator should find a test case 
that executes a certain error function. The second describes a formula that is 
used to obtain a standard test suite for quality assurance: the test generator 
should find a test suite for branch coverage. 


License and Qualification. The license of each participating test generator 
must allow its free use for replication of the competition experiments. Details on 
qualification criteria can be found in the competition report of Test-Comp 2019 [5]. 
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3 Categories and Scoring Schema 


Benchmark Programs. The input programs were taken from the largest and 
most diverse open-source repository of software verification tasks !?, which is 
also used by SV-COMP [3]. As in 2019, we selected all programs for which the 
following properties were satisfied (see issue on GitHub t°? and report [5]): 


compiles with gcc, if a harness for the special methods 4 is provided, 
should contain at least one call to a nondeterministic function, 

does not rely on nondeterministic pointers, 

does not have expected result ‘false’ for property ‘termination’, and 


Che oe Nr 


has expected result ‘false’ for property ‘unreach-call’ (only for category Error 
Coverage). 


This selection yielded a total of 3 230 test tasks, namely 699 test tasks for category 
Error Coverage and 2531 test tasks for category Code Coverage. The test tasks 
are partitioned into categories, which are listed in Tables 6 and 7 and described in 
detail on the competition web site.!° Figure 2 illustrates the category composition. 


Category Error-Coverage. The first category is to show the abilities to dis- 
cover bugs. The programs in the benchmark set contain programs that contain a 
bug. Every run will be started by a batch script, which produces for every tool 
and every test task (a C program together with the test specification) one of 
the following scores: 1 point, if the validator succeeds in executing the program 
under test on a generated test case that explores the bug (i.e., the specified 
function was called), and 0 points, otherwise. 


Category Branch-Coverage. The second category is to cover as many branches 
of the program as possible. The coverage criterion was chosen because many 
test-generation tools support this standard criterion by default. Other coverage 
criteria can be reduced to branch coverage by transformation [25]. Every run will 
be started by a batch script, which produces for every tool and every test task 
(a C program together with the test specification) the coverage of branches of 
the program (as reported by TssTCov [14]; a value between 0 and 1) that are 
executed for the generated test cases. The score is the returned coverage. 


Ranking. The ranking was decided based on the sum of points (normalized for 
meta categories). In case of a tie, the ranking was decided based on the run time, 
which is the total CPU time over all test tasks. Opt-out from categories was 
possible and scores for categories were normalized based on the number of tasks 
per category (see competition report of SV-COMP 2013 [2], page 597). 


12 https: //github.com/sosy-lab/sv- benchmarks 

13 https: //github.com/sosy-lab/sv-benchmarks/pull/774 
14 https: //test-comp.sosy-lab.org/2020/rules.php 

15 https: //test-comp.sosy-lab.org/2020/benchmarks.php 
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Fig. 2: Category structure for Test-Comp 2020 
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(a) Test-Generation Tasks (b) Benchmark Definitions 


(e) Test-Generation Run 


(f) Test Suite 


(c) Tool-Info Modules (d) Tester Archives 


Fig. 3: Test-Comp components and the execution flow 


Table 2: Publicly available components for replicating Test-Comp 2020 


Component Fig.3 Repository Version 
Test-Generation Tasks (a) github.com/sosy-lab/sv-benchmarks testcomp20 
Benchmark Definitions (b)  gitlab.com/sosy-lab/test-comp/bench-defs testcomp20 
Tool-Info Modules (c)  github.com/sosy-lab/benchexec 2.5.1 
Tester Archives (d) gitlab.com/sosy-lab/test-comp/archives-2020 testcomp20 
Benchmarking (e)  github.com/sosy-lab/benchexec 2.5.1 
Test-Suite Format (f)  gitlab.com/sosy-lab/software/test-format testcomp20 


4 Reproducibility 


In order to support independent replication of the Test-Comp experiments, 
we made all major components that are used for the competition available in 
public version repositories. An overview of the components that contribute to 
the reproducible setup of Test-Comp is provided in Fig. 3, and the details are 
given in Table 2. We refer to the report of Test-Comp 2019 [5] for a thorough 
description of all components of the Test-Comp organization and how we ensure 
that all parts are publicly available for maximal replicability. 


In order to guarantee long-term availability and immutability of the test- 
generation tasks, the produced competition results, and the produced test suites, 
we also packaged the material and published it at Zenodo. The DOIs and 
references are listed in Table 3. The archive for the competition results includes 
the raw results in BENCHEXEC’s XML exchange format, the log output of the test 
generators and validator, and a mapping from files names to SHA-256 hashes. 
The hashes of the files are useful for validating the exact contents of a file, and 
accessing the files inside the archive that contains the test suites. 


To provide transparent access to the exact versions of the test generators that 
were used in the competition, all tester archives are stored in a public Git reposi- 
tory. GITLAB was used to host the repository for the tester archives due to its gen- 
erous repository size limit of 10 GB. The final size of the Git repository is 1.47 GB. 
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Table 3: Artifacts published for Test-Comp 2020 


Content DOI Reference 


Test-Generation Tasks 10.5281/zenodo.3678250 [7] 
Competition Results 10.5281/zenodo.3678264 [6] 
Test Suites (Witnesses) 10.5281/zenodo.3678275 [8] 


Table 4: Competition candidates with tool references and representing jury members 


Participant Ref. Jury member Affiliation 

COVERITEST [10,31] Marie-Christine Jakobs TU Darmstadt, Germany 

ESBMC [22,23] Lucas Cordeiro U. of Manchester, UK 
HYBRIDTIGER [15,37] Sebastian Ruland TU Darmstadt, Germany 

KLEE [17] Martin Nowack Imperial College London, UK 
LEGION [36] Gidon Ernst LMU Munich, Germany 
LIBKLUZZER [34] Hoang M. Le U. of Bremen, Germany 
PRTEST [35] Thomas Lemberger LMU Munich, Germany 
SYMBIOTIC [18,19] Marek Chalupa Masaryk U., Czechia 

TRACERX [29,30] Joxan Jaffar Nat. U. of Singapore, Singapore 
VERIFUZZ [20] Raveendra Kumar M. Tata Consultancy Services, India 


5 Results and Discussion 


For the second time, the competition experiments represent the state of the 
art in fully automatic test-generation for whole C programs. The report helps 
in understanding the improvements compared to last year, in terms of effec- 
tiveness (test coverage, as accumulated in the score) and efficiency (resource 
consumption in terms of CPU time). All results mentioned in this article were 
inspected and approved by the participants. 


Participating Test Generators. Table 4 provides an overview of the participat- 
ing test-generation systems and references to publications, as well as the team rep- 
resentatives of the jury of Test-Comp 2020. (The competition jury consists of the 
chair and one member of each participating team.) Table 5 lists the features and 
technologies that are used in the test-generation tools. An online table with infor- 
mation about all participating systems is provided on the competition web site. 16 


Computing Resources. The computing environment and the resource limits 
were mainly the same as for Test-Comp 2019 [5]: Each test run was limited to 
8 processing units (cores), 15GB of memory, and 15min of CPU time. The test- 
suite validation was limited to 2 processing units, 7 GB of memory, and 5h of CPU 
time (was 3h for Test-Comp 2019). The machines for running the experiments are 
part of a compute cluster that consists of 168 machines; each test-generation run 
was executed on an otherwise completely unloaded, dedicated machine, in order 


16 https: //sv-comp.sosy-lab.org/2020/systems.php 
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Table 5: Technologies and features that the competition candidates offer 
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to achieve precise measurements. Each machine had one Intel Xeon E3-1230 v5 
CPU, with 8 processing units each, a frequency of 3.4GHz, 33GB of RAM, 
and a GNU/Linux operating system (x86_64-linux, Ubuntu 18.04 with Linux 
kernel 4.15). We used BENCHExEc [12] to measure and control computing resources 
(CPU time, memory, CPU energy) and VERIFIERCLOUD 1” to distribute, install, 
run, and clean-up test-case generation runs, and to collect the results. The values 
for time and energy are accumulated over all cores of the CPU. To measure the 
CPU energy, we use CPU ENERGY METER [13] (integrated in BENCHExEc [12]). 
Further technical parameters of the competition machines are available in the 
repository that also contains the benchmark definitions. 18 

One complete test-generation execution of the competition consisted of 
29 899 single test-generation runs. The total CPU time was 178 days and the 
consumed energy 49.9kWh for one complete competition run for test-generation 
(without validation). Test-suite validation consisted of 29899 single test-suite 


Li https: //vcloud.sosy-lab.org 
18 https: //gitlab.com/sosy-lab/test-comp/bench-defs/tree/testcomp20 


514 D. Beyer 


Table 6: Quantitative overview over all results; empty cells mark opt-outs 


n 
Q 
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5 S 
í © 
= Hn N 
-o E 
ia i T fas] 
E os D+ Spn 
Participant Be a 25 
OS OA Os 
CoVERITEST 405 1412 1836 
ESBMC 506 
HYBRIDTIGER 394 1351 1772 
KLEE 502 1342 2017 
LEGION 302 1257 1501 
LIBKLUZZER 630 1597 2474 
PRTEST 66 545 500 
SYMBIOTIC 435 849 1548 
TRACERX 373 1244 1654 
VERIFUZZ 636 1577 2476 


validation runs. The total consumed CPU time was 632 days. Each tool was 
executed several times, in order to make sure no installation issues occur dur- 
ing the execution. Including preruns, the infrastructure managed a total of 
401156 test-generation runs (consuming 1.8 years of CPU time) and 527805 
test-suite validation runs (consuming 6.5 years of CPU time). We did not 
measure the CPU energy during preruns. 


Quantitative Results. Table 6 presents the quantitative overview of all tools 
and all categories. The head row mentions the category and the number of test 
tasks in that category. The tools are listed in alphabetical order; every table 
row lists the scores of one test generator. We indicate the top three candidates 
by formatting their scores in bold face and in larger font size. An empty table 
cell means that the tester opted-out from the respective main category (perhaps 
participating in subcategories only, restricting the evaluation to a specific topic). 
More information (including interactive tables, quantile plots for every category, 
and also the raw data in XML format) is available on the competition web site 1° 
and in the results artifact (see Table 3). Table 7 reports the top three testers for 
each category. The consumed run time (column ‘CPU Time’) is given in hours 
and the consumed energy (column ‘Energy’) is given in kWh. 


Score-Based Quantile Functions for Quality Assessment. We use score- 
based quantile functions [12] because these visualizations make it easier to 
understand the results of the comparative evaluation. The web site 1°? and the 


19 https://test-comp.sosy-lab.org/2020/results 
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Table 7: Overview of the top-three test generators for each category (measurement 
values for CPU time and energy rounded to two significant digits) 


Rank Verifier Score CPU Energy 
Time 


(inh) (in kWh) 


Cover-Error 


1 VERIFUZZ 636 17 .22 
2 LIBKLUZZER 630 130 1.3 
3 EsBMc 506 9.5 11 
Cover-Branches 

1 LiBKLUZZER 1597 540 5.6 
2 VERIFUZZ 1577 590 7.5 
3 CoVERITEST 1412 430 4.4 
Overall 

1 VERIFUZZ 2476 610 7.7 
2 LiBKLUZZER 2474 670 6.9 
3 KLEE 2017 460 5.2 


CoVeriTest —K— 
3000 + ESBMC —— 
HybridTiger —¥— 
KLEE -E 
Legion —><— 
2500 F  LibKluzzer —-— 
PRTest —O— 
Symbiotic —O— 
Tracer-X —OQ— 
VeriFuzz 


wl 
i 


2000 F 


1500 f 


1000 
sool a a | 


0 500 1000 1500 2000 2500 
Cumulative score 


Min. number of test tasks 


Fig. 4: Quantile functions for category Overall. Each quantile function illustrates 
the quantile (#-coordinate) of the scores obtained by test-generation runs below a 
certain number of test tasks (y-coordinate). More details were given previously [5]. 
A logarithmic scale is used for the time range from 1s to 1000s, and a linear 
scale is used for the time range between 0s and 1s. 
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Table 8: Alternative rankings; quality is given in score points (sp), CPU time in 
hours (h), energy in kilo-watt-hours (kWh), the rank measure in joule per score 
point (J/sp); measurement values are rounded to 2 significant digits 


Rank Verifier Quality CPU CPU Rank 
Time Energy Measure 

(sp) (h) (kWh) (J/sp) 

Green Testers 

1 SYMBIOTIC 1548 41 0.50 1.2 

2 LEGION 1501 160 1.8 4.4 

3 TRACERX 1654 310 3.8 8.3 

worst 53 


results artifact (Table 3) include such a plot for each category; as example, we 
show the plot for category Overall (all test tasks) in Fig. 4. A total of 9 testers 
(all except EspMc) participated in category Overall, for which the quantile plot 
shows the overall performance over all categories (scores for meta categories 
are normalized [2]). A more detailed discussion of score-based quantile plots for 
testing is provided in the previous competition report [5]. 


Alternative Ranking: Green Test Generation — Low Energy Con- 
sumption. Since a large part of the cost of test-generation is caused by the 
energy consumption, it might be important to also consider the energy efficiency 
in rankings, as complement to the official Test-Comp ranking. The energy is mea- 
sured using CPU ENERGY METER [13], which we use as part of BENCHExEc [12]. 
Table 8 is similar to Table 7, but contains the alternative ranking category 
Green Testers. Column ‘Quality’ gives the score in score points, column ‘CPU 
Time’ the CPU usage in hours, column ‘CPU Energy’ the CPU usage in kWh, 
column ‘Rank Measure’ uses the energy consumption per score point as rank 


. total CPU energy : : 
measure: “Se” with the unit J/sp. 


6 Conclusion 


Test-Comp 2020, the 24 edition of the Competition on Software Testing, attracted 
10 participating teams. The competition offers an overview of the state of the art in 
automatic software testing for C programs. The competition does not only execute 
the test generators and collect results, but also validates the achieved coverage 
of the test suites, based on the latest version of the test-suite validator TEsTCov. 
The number of test tasks was increased to 3 230 (from 2356 in Test-Comp 2019). 
As before, the jury and the organizer made sure that the competition follows the 
high quality standards of the FASE conference, in particular with respect to the 
important principles of fairness, community support, and transparency. 
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Abstract. In theory, software model checkers are well-suited for auto- 
mated test-case generation. The idea is to perform (non-)reachability 
queries for the test goals and extract test cases from resulting counter- 
examples. However, in case of realistic programs, even simple coverage 
criteria (e.g., branch coverage) force model checkers to deal with sev- 
eral hundreds or even thousands of test goals. Processing each of these 
test goals in isolation with model checking techniques does not scale. 
Therefore, our tool HybridTiger builds on recent ideas on multi-property 
verification. However, since every additional property (i.e., test goal) re- 
duces the model checker’s abstraction possibilities, we split the set of 
all test goals into different partitions. In Test-Comp 2019, we applied 
a random partitioning strategy and used predicate analysis as model 
checking technique. In Test-Comp 2020, we improved our technique in 
two ways. First, we exploit domination information among control-flow 
locations in our partitioning strategy to group test goals being located 
on (preferably) similar paths. Second, we account to inherent weaknesses 
of the predicate analysis by applying a hybrid software model-checking 
approach that switches between explicit model checking and predicate- 
based model checking on-the-fly. Our tool HybridTiger is integrated into 
the software analysis framework CPACHECKER. 


Keywords: CPAchecker - Test-Goal Set Partitioning - Hybrid Model- 
Checking Cooperation 


1 Software Architecture 


The HybridTiger algorithm is implemented within the software verification 
framework CPACHECKER [4]. CPACHECKER utilizes the Eclipse CDT C-parser®. 


3 https://www.eclipse.org/cdt / 
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l |int fib(int n){ 

2 if (n <= 0) return -1; 

3 if(n == 1) return 1; 

4 if(n == 2) return 1; 

5 return fib(n-1) 

6 +fib(n-2); 

7 È +fib(n — 2) 
(a) C-Program (b) CFA 


Fig. 1. C Program to calculate the Fibonacci number of n and corresponding CFA 


CPACHECKER allows developers to easily integrate new algorithms like Hybrid- 
Tiger, which may use other algorithms implemented in CPACHECKER, such as 
counterexample-guided abstraction refinement (CEGAR) [5]. Additionally, new 
reachability analyses can be integrated as CONFIGURABLE PROGRAM ANALYSES 
(CPAs) [2]. Each CPA consist of an abstract domain with the operators post, 
merge, and stop. Multiple CPAs can also be combined into one CPA. 

HybridTiger uses the COVERITEST [3] algorithm to sequentially combine 
test-case generation runs utilizing different verification techniques. Each test-case 
generation run applies the CPA/Tiger-MGP*(Tiger Multi-Goal-Partitioning) 
algorithm, which utilizes the CEGAR algorithm. 


2 Test-Generation Approach 


HybridTiger first extracts test goals from input programs and repeatedly exe- 
cutes reachability analyses provided by CPACHECKER until every reachable test 
goal is covered by at least one test case. To this end, test goals are encoded into 
(non-)reachability properties. If a test goal has been reached, CPACHECKER thus 
returns a counterexample and HybridTiger extracts a test case (i.e., a vector of 
input values), writes the test case to disk and marks the test goal as covered. 


Hybrid Test-Case Generation. HybridTiger receives as inputs a C program and 
a property specification (i.e., a set of test goals). Next, HybridTiger transforms 
the C program into a control-flow automaton (CFA) [1]. Figure 1 shows an 
example C program and the corresponding CFA. After CFA generation, the 
CoVERITEST algorithm as configured in HybridTiger (see Fig. 2) is executed. 
In every new iteration, each analysis of our configuration first (re-)partitions 
the set of uncovered test goals (e.g., partitions P1, P2, P3 and P4 for CPA/- 
Tiger-MGP-Value and P1 and P2 for CPA/Tiger-MGP-Predicate in Fig. 2). In 
each iteration, CPA/Tiger-MGP-Value is performed first using explicit model 
checking and is stopped after 120s. After that, CPA/Tiger-MGP-Predicate is 


* https://www.es.tu-darmstadt.de/es/team/sebastian-ruland/testcomp19/ 
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Interleaved Algorithm 


Tiger-MGP- Tiger-MGP- 
Value Predicate 


VA-Partitioning 


Fig. 2. Overview of HybridTiger 


executed using predicate model checking for 780s, where the overall iteration 
stops after reaching the global time limit. 


Partitioning. HybridTiger utilizes domination information of test-goal locations 
according to the respective CFA paths. This meta-information is retrieved from 
the generated CFA: each CFA node (i.e., basic block of program locations) in 
Fig. 1 is annotated with a post-order ID such that a node will only be reached 
after all nodes on the same path with a larger ID have been reached at least 
once. Hence, we use the IDs of predecessor nodes related to the CFA edges of 
test goals as sorting criterion for the overall set of test goals before splitting 
this set into partitions of predefined sizes. In this way, test goals sharing similar 
paths are more likely to be assigned to the same partition thus facilitating reuse 
potentials of reachability-information during reachability analysis. 


3 Strengths and Weaknesses 


HybridTiger has three main strengths. First, the directed generation of test cases 
aiming at covering particular test goals significantly reduces the overall number 
of test cases. Additionally, most test cases produced by HybridTiger effectively 
increase the overall coverage (i.e., HybridTiger produces mostly correct and non- 
redundant test cases). Second, HybridTiger uses control-flow information to par- 
tition test goals which potentially enhances efficiency of test-case generation due 
to information reuse among similar test goals. Lastly, HybridTiger uses combina- 
tions of different analysis strategies (i.e., value analysis and predicate analysis) to 
cope with structural diversity of input programs. One weakness of HybridTiger 
is that the partitioning approach does not improve performance of a goal-by-goal 
approach if being applied to programs with a small number of test goals (e.g., 
reaching one single error location as demanded in the Cover-Error category). 
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Results. In Test-Comp 2020, HybridTiger has participated in all categories and 
managed to reach the 4th rank in Code Coverage and the 6th rank in Finding 
Bugs, where HybridTiger performed better on tasks with many test goals. 


4 Setup and Configuration 


The version of HybridTiger submitted to Test-Comp 2020 is built from the 
tigerIntegration2° branch revision 32283 of the CPACHECKER repository and is 
archived at https://gitlab.com/sosy-lab/test-comp/archives- 2020. HybridTiger 
can be applied to a single file using the command 


1 scripts/cpa.sh —benchmark —heap 10000M —tigertestcomp20 
—spec spec.prp file 


where spec is the property file (e.g., coverage-error-call or coverage-branches) 
and file is the input C program. Statistics of the analyses are printed to console 
and meta data on generated test cases as well as the test suite are written to 
files in the output folder. In order to run HybridTiger for the Test-Comp 2020 
benchmarks a Linux system with Java 8, BenchExec® and the SV-benchmarks’ 
is required. Finally, run BenchEzec with: 


— the benchmark definition cpa-tiger.xml (archived at https://gitlab.com/sosy- 
lab/test-comp/bench-defs/tree/master/benchmark-defs), and 

— the tool-info module cpachecker.py (archived at https://github.com/sosy- 
lab/benchexec/tree/master/benchexec/tools). 


5 Project and Contributors 


CPACHECKER is maintained by the Software Systems Lab at LMU Munich 
as open-source project, contributed by an international group of researchers 
from LMU Munich, University of Passau, Technical University of Darmstadt and 
the Institute for System Programming of the Russian Academy of Sciences. The 
branch tigerIntegration2 from which HybridTiger is built is mainly developed 
at the Technical University of Darmstadt. Additional information is available 
at https://cpachecker.sosy-lab.org/. 


Acknowledgement. This work was funded by the Hessian LOEWE initiative 
within the Software-Factory 4.0 project. 
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Abstract. ESBMC is an SMT-based bounded model checker for real- 
world C programs. Such programs often represent real numbers using 
the floating-points, most commonly, the IEEE floating-point standard 
(IEEE 754-2008). Thus, ESBMC now includes a new floating-point arith- 
metic encoding layer in our SMT backend, that encodes floating-point 
operations into bit-vector operations. In particular, ESBMC can use off- 
the-shelf SMT solvers that offer support for bit-vectors only to encode 
floating-point arithmetic. 


Keywords: Automated Test Generation - Bounded Model Checking - 
Software Testing - Satisfiability Modulo Theories. 


1 Test Generation Approach 


ESBMC [3,7] is an SMT-based bounded model checker for the verification of 
safety properties and assertions in both sequential and multi-threaded C pro- 
grams. ESBMC primarily aims to help software developers by finding subtle 
bugs in their code (e.g., array bounds violation, null-pointer dereference, arith- 
metic overflow, and deadlock). It also implements k-induction [5,10] and can 
be used to prove the absence of property violations, i.e., program correctness. 
In Test-Comp’20 [1], ESBMC produces test cases using the falsification mode, 
which is an iterative bounded model checking (BMC) approach that repeatedly 
unwinds the program until it either finds a property violation or exhausts time 
or memory limits. Intuitively, ESBMC aims to find a counterexample with up 
to k loop unwindings. The algorithm relies on the symbolic execution engine to 
increasingly unwind the loop after each iteration. ESBMC uses the falsification 
mode because it is known that there exist property violations in all programs 
in the Test-Comp, so there exists no need to prove correctness. From the coun- 
terexample produced by ESBMC, we define the test specification required by 
the competition using an external Python script. 
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ESBMC runs with an improved SMT backend for test-case generation, which 
includes a floating-point encoding layer that converts all floating-point opera- 
tions into bit-vector operations (a process called bit-blasting) when encoding the 
program into an SMT formula. Previous ESBMC versions [8] were only able to 
encode and verify programs using a fixed-point representation for floating-points. 
This particular encoding is a valid approximation since fixed-points are used in 
a large number of applications in the embedded world; however, it restricted 
ESBMC from verifying the broad set of programs that relied on processors that 
implement floating-point arithmetic. 

There exist various strategies to solve SMT formulae with floating-point 
arithmetic. It is tempting to use a real arithmetic strategy to tackle these for- 
mulae; however, the floating-point arithmetic is an approximation of the real 
one and introduces a new set of values (e.g., NaNs). ESBMC follows the same 
approach as CBMC [2] and 2LS [15], which also bit-blast all operations, includ- 
ing floating-point operations, before checking satisfiability using SAT solvers. 
The bit-blasting algorithm in ESBMC is based on the bit-blasting performed 
by Z3, which is an improved version of the algorithms described by Muller et 
al. [12]. A floating-point is encoded into SMT using a single bit-vector and fol- 
lows the IEEE-754 [11] standard for the size of the exponent and significand. 
For instance, a half-precision floating-point (16 bits) has 1 bit for the sign, 5 
bits for the exponent and 11 bits for the significand (1 hidden bit) [11]. Thus, 
the floating-point encoding layer in ESBMC performs the operations in the bit- 
vectors representing the floating-points, e.g., the formula to check if a bit-vector 
is a NaN checks if the exponent is all 1’s and if the significand is not zero. 
The resulting SMT formulae are the translation of the floating-point arithmetic 
digital circuits to SMT [12]. 

The improved SMT backend is an extension of our previous work on floating- 
point arithmetic encoding [9]. Previously, we extended ESBMC to encode floating- 
point arithmetic into SMT, however, we were restricted to SMT solvers that 
supported the FP theory natively (i.e., Z3, MathSAT and CVC4) [9]. Now, the 
floating-point encoding layer extends the FP theory support to all solvers sup- 
ported by ESBMC, including Boolector [13] and Yices [4], which do not natively 
support that FP theory. In Test-Comp’20, ESBMC uses Boolector 3.0.1 and 
produces 470 confirmed test specifications. In particular, ESBMC achieved the 
the highest score in the ReachSafety-Floats, a category focused on programs 
with floating-point arithmetics, correctly verifying 30 out of the 32 test cases 
and outperforming all other tools in this category. The results in this category 
demonstrates the effectiveness of the floating-point bit-blasting: Boolector does 
not support the FP theory natively and yet was able to reason about almost all 
the test cases in the competition that involved floating-point arithmetic. 


2 Strengths and Weaknesses 


The falsification mode allows ESBMC to keep unwinding the program until a 
property violation is found, or until it exhausts time or memory limits. Its BMC 
approach, however, stops after it has found a property violation and prevents 
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the generation of tests specifications for multiple property violations or coverage 
testing. This approach, however, is an advantage in the Cover-Error category 
as finding one error is the primary goal. 

Encoding programs using the SMT FP theory has several advantages over the 
fixed-point approach. ESBMC can now accurately model C programs that use 
the IEEE floating-point arithmetic [11]. In particular, ESBMC ships with models 
for most of the current C11 standard functions. Furthermore, the floating-point 
encoding layer in ESBMC extends the support for the SMT FP theory to solvers 
that do not support it natively. ESBMC can verify programs with floating-point 
arithmetic using all currently supported solvers — including Boolector and Yices, 
which do not support the SMT FP theory. 

In Test-Comp’20 results, 470 tests were confirmed while 13 tests were uncon- 
firmed, where 11 were due to bugs in the script that generates the test specifica- 
tion (e.g., non-deterministic unions or duplication of non-deterministic values)”, 
1 was due to a bug in ESBMC that caused the tool to fail®, and 1 was due to unde- 
fined behavior in the test case’. We chose Boolector for the competition because 
it outperforms all other SMT solvers supported by ESBMC. In the ReachSafety- 
Floats category, Boolector even outperforms all other SMT solvers that natively 
support FP theory. We believe that Boolector employs more abstract and less ex- 
pensive techniques (e.g., algebraic reduction rules and contextual simplification) 
before bit-blasting SMT formulae into SAT. 

The drawback of the floating-point encoding is that they are very complex; 
it is not uncommon to see the SMT solvers struggling to support every corner 
case [6,14]. The maintenance of our floating-point encoding layer is hard, and we 
do not yet have proof that it is entirely correct, even though empirical evidence [9] 
points in that direction and suggests that the approach is efficient in finding bugs 
as shown by Test-Comp’20 results. The complex bit-vector formulae also prevent 
high-level reasoning about the problem by the SMT solver, however, this is not a 
significant issue for ESBMC as all high-level simplifications are performed before 
encoding the program into SMT formulae. 


3 Tool Setup and Configuration 


In order to run our esbmc-wrapper.py script, one must set the architecture 
(i.e., 32 or 64-bit), the competition strategy (i.e., k-induction, falsification, or 
incremental BMC), the property file path, and the benchmark path, as: 


esbmc-wrapper.py [-a {32, 64}] [-p PROPERTY_FILE] 
[-s {kinduction,falsi,incr,fixed}] 
[BENCHMARK_PATH] 


5 https: //github.com/esbmc/esbmc/issues/142 

6 https://github.com/esbmc/esbmc/issues/143 

T https: //github.com/sosy-lab/sv-benchmarks/pull/1073 

8 https://gitlab.com/sosy-lab/test-comp/archives-2020/blob/master /2020/ 
esbmc-falsi.zip 
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where -a sets the architecture, -p sets the property file path, and -s sets the 
strategy (e.g., kinduction, falsi, incr, or fixed). In Test-Comp’20, ESBMC 
uses falsi for falsification. 

Internally, by choosing the falsification strategy, the following options are 
set when executing ESBMC: --no-div-by-zero-check, disables the division 
by zero check (required by Test-Comp); --force-malloc-success, sets that 
all dynamic allocations succeed (a Test-Comp requirement); --floatbv, en- 
ables floating-point SMT encoding; --falsification, enables the falsification 
mode; --unlimited-k-steps, removes the upper limit of iteration steps in 
the falsification algorithm; --witness-output, sets the witness output path; 
--no-bounds-check and --no-pointer-check disable bounds check and pointer 
safety checks, resp., since we are only interested in finding reachability bugs; 
--k-step 5, sets the falsification increment to 5; --no-allign-check, disables 
pointer alignment checks; and --no-slice, disables slicing of unnecessary in- 
structions. The Benchexec tool info module is named esbmc.py and the bench- 
mark definition file is esbmc-falsi.xml. 


4 Software Project 


The ESBMC source code is written in C++ and it is available for downloading at 
GitHub”, which include self-contained binaries for ESBMC v6.1 64-bit. ESBMC 
is publicly available under the terms of the Apache License 2.0. Instructions for 
building ESBMC from the source code are given in the file BUILDING (including 
the description of all dependencies). ESBMC is an international-joint project 
with the SIDIA Instituto de Ciéncia e Tecnologia, Federal University of Ama- 
zonas, University of Southampton, University of Manchester, and the University 
of Stellenbosch. 


References 


1. Beyer, D.: Second competition on software testing: Test-comp 2020. In: Proc. 
FASE. LNCS , Springer (2020) 

2. Clarke, E., Kroening, D., Lerda, F.: A tool for checking ANSI-C programs. In: 
Tools And Algorithms For The Construction And Analysis Of Systems. LNCS, 
vol. 2988, pp. 168-176 (2004) 

3. Cordeiro, L.C., Fischer, B.: Verifying multi-threaded software using SMT-based 
context-bounded model checking. In: International Conference on Software Engi- 
neering. pp. 331-340 (2011) 

4. Dutertre, B.: Yices 2.2. In: Computer-Aided Verification. LNCS, vol. 8559, pp. 
737-744 (2014) 

5. Eén, N., Sörensson, N.: Temporal induction by incremental SAT solving. Electronic 
Notes in Theoretical Computer Science 89(4), 543-560 (2003) 

6. Erkk, L.: Bug in floating-point conversions. https://github.com/Z3Prover/z3/ 
issues/1564 (2018), [Online; accessed January-2020] 


° https: //github.com/esbmc/esbmc 


10. 


11. 
12. 


13. 


14. 


15. 


Scalable and Precise Test Generation based on the Floating-Point Theory 529 


. Gadelha, M.R., Monteiro, F., Cordeiro, L., Nicole, D.: ESBMC v6.0: Verifying C 


programs using k-induction and invariant inference. In: Tools And Algorithms For 
The Construction And Analysis Of Systems. LNCS, vol. 11429, pp. 209-213 (2019) 


. Gadelha, M.R., Monteiro, F.R., Morse, J., Cordeiro, L.C., Fischer, B., Nicole, D.A.: 


ESBMC 5.0: An industrial-strength C model checker. In: Automated Software 
Engineering. pp. 888-891 (2018) 


. Gadelha, M.Y.R., Cordeiro, L.C., Nicole, D.A.: Encoding floating-point numbers 


using the SMT theory in ESBMC: An empirical evaluation over the SV-COMP 
benchmarks. In: Simpósio Brasileiro De Métodos Formais. LNCS, vol. 10623, pp. 
91-106 (2017) 

Gadelha, M.Y.R., Ismail, H.I., Cordeiro, L.C.: Handling loops in bounded model 
checking of C programs via k-induction. Software Tools for Technology Transfer 
19(1), 97-114 (2017) 

IEEE: IEEE Standard For Floating-Point Arithmetic (2008), IEEE 754-2008 
Muller, J.M., Brisebarre, N., Dinechin, F., Jeannerod, C.P., Lefe, V., Melquiond, 
G., Revol, N., Stehl., Torres, S.: Handbook of Floating-Point Arithmetic. Birkher 
Boston, 1st edn. (2010) 

Niemetz, A., Preiner, M., Biere, A.: Boolector 2.0 system description. Journal on 
Satisfiability, Boolean Modeling and Computation 9, 53-58 (2014) 

Noetzli, A.: Failing precondition when multiplying 4-bit significand/4-bit ex- 
ponent floats. https://github.com/CVC4/CVC4/issues/2182 (2018), [Online; ac- 
cessed January-2020] 

Schrammel, P., Kroening, D., Brain, M., Martins, R., Teige, T., Bienmiiller, T.: 
Incremental bounded model checking for embedded software (extended version). 
Formal Aspects of Computing 29(5), 911-931 (2017) 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 


or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 


made. 


The images or other third party material in this chapter are included in the chapter’s 


Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter’s Creative Commons license and your intended 


use is not permitted by statutory regulation or exceeds the permitted use, you will need 
to obtain permission directly from the copyright holder. 


Check for 
updates 


TracerX: Dynamic Symbolic Execution with 
Interpolation (Competition Contribution) 


Joxan Jaffar ®, Rasool Maghareh @, Sangharatna Godboley ®, and 
Xuan-Linh Ha ®© 


National University of Singapore, Singapore, Singapore 
{joxan,rasool, sanghara,hax1}@comp.nus.edu.sg 
http://www.springer.com/gp/computer-science/Incs 


Abstract. Dynamic Symbolic Execution (DSE) is an important method 
for testing of programs. An important system on DSE is KLEE [1] which 
inputs a C/C++ program annotated with symbolic variables, compiles 
it into LLVM, and then emulates the execution paths of LLVM using 
a specified backtracking strategy. The major challenge in symbolic ex- 
ecution is path explosion. The method of abstraction learning [7] has 
been used to address this. The key step here is the computation of an 
interpolant to represent the learned abstraction. 

TracerX, our tool, is built on top of KLEE and it implements and uti- 
lizes abstraction learning. The core feature in abstraction learning is sub- 
sumption of paths whose traversals are deemed to no longer be necessary 
due to similarity with already-traversed paths. Despite the overhead of 
computing interpolants, the pruning of the symbolic execution tree that 
interpolants provide often brings significant overall benefits. In particu- 
lar, TracerX can fully explore many programs that would be impossible 
for any non-pruning system like KLEE to do so. 


Keywords: Dynamic Symbolic Execution, Interpolation, Testing, Code 
Coverage 


1 Overview and Software Architecture 


Symbolic execution has emerged as an important method to reason about pro- 
grams, in both verification and testing. By reasoning about inputs as symbolic 
entities, its fundamental advantage over traditional black-box testing, which uses 
concrete inputs, is simply that it has better coverage of program paths. In par- 
ticular, dynamic symbolic execution (DSE), where the execution space is ex- 
plored path-by-path, has been shown effective in systems such as DART [4] and 
KLEE [1]. A key advantage of DSE is that by examining a single path, the anal- 
ysis can be both precise, and efficient. However, the key disadvantage of DSE is 
that the number of program paths is in general exponential in the program size, 
and most available implementations of DSE do not employ a general technique 
to prune away some paths. 

In TracerX, our primary objective is to address the path explosion problem in 
DSE. More specifically, we wish to perform path-by-path exploration of DSE to 
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enjoy its benefits, but we include a pruning mechanism so that a generated path 
can be eliminated if it is guaranteed not to violate the stated safety conditions. 
Toward this goal, we employ the method of abstraction learning [7], which is 


more popularly known as lazy annotations [8,9]. 
KLEE SMT Solver 
Interpolation Engine 


Fig. 1. TracerX Framework 


The software ar- 
chitecture of TracerX 
is presented in Fig. C++ 
1. The core feature Objc 
of TracerX is the use 
of interpolation, which 
serves to generalize the 
context of a node in 
the symbolic execution 
tree (SET) with an approximation of the weakest precondition of the node. This 
method was implemented in the TRACER system [6], which was the first system 
to demonstrate DSE with pruning. TRACER was primarily used to evaluate new 
algorithms in verification, analysis and testing, e.g., [2,3,5]. While TRACER was 
able to perform bounded verification and testing on many examples, it could not 
accommodate industrial programs which often dynamically manipulate the heap 
memory. TracerX combines the state-of-the-art DSE technology used in KLEE 
with the pruning technology in TRACER to address this issue. 


LLVM IR 
Output 


Annotations 


Now we explain interpolation in more detail. x = 0; 
While exploring the SET, an interpolant of a state if ( b1 ) x += 12; 
is an abstraction of it which ensures the safety of the if ( b2 ) x += 15; 
subtree rooted at that state. In other words, if we assert (x != 28); 
continue the execution with the interpolant instead Fig. 2. A Sample Program 
of the state we will not reach any error. Thus, upon 
encountering another state of the same program point, if the context of the 
state implies the interpolant formula, then continuing the execution from the 
new state will not lead to any error. Consequently, we can prune the subtree 
rooted at the new state. 


Example 1. Consider the program 
in Fig. 2 and its SET explored by x#1AXx#16 A 
SE with interpolation in Fig. 3. ican 
The variables b1, b2 are symbolic 
and all combinations of the boolean 
conditions are satisfiable. The final 
statement assert(x # 28) is the tar- x213 Ax #28 
get. The path condition for every 
path is shown in the set in black x#13 6 (<6,>)x+28 
color. x+=15 i 
We traverse the SET in a left- 
right depth-first manner. In the end assert( x + 28) 
of the first path x = 27 which does Fig. 3. SET with Interpolation of Program 
not violate the assertion. Consider- im Fig. 2 
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ing the target and the update on variable x between (5,) and (Ta), we generate 
an interpolant which store the weakest precondition at (5,): « 4 13 (Shown in 
purple color). Similarly, an interpolant is also computed at (6a): x Æ 28. 


Now, combining these two interpolants, we generate an interpolant for the 
node (44). Note that the weakest precondition here is by — (x #413) A !b2 —> 
(x # 28). We approximate this formula with the conjunction (x # 13) A (£ # 
28). Next, moving to (2a), the interpolant at (4a) is received and considering 
the update on variable x between (24) and (4,), an interpolant is generated at 
(2a): © #1 A x # 16. Now moving to (4,), we check if the path condition 
at (4) (x = OA!b; A skip) implies the interpolant that was generated at (4a) 
(x £13 A x Æ 28). Since the implication holds, node (4,) is subsumed with 
node (44) (indicated by orange arrow) and the subtree below (4,) is pruned. The 
SET traversal continues by computing the interpolant at (3,) which is computed 
from £13 A x ¥ 28 subsuming (4,) and the updates between (3a) and (4%) 
(which is skip). The interpolants at (24) and (3,) are then combined to generate 
an interpolant at (la): cA1 A «416 A «413 A a #28. Note that KLEE 
would explore the 4 paths in the SET while TracerX explores only two paths to 
the end. 


2 Discussion on Strengths and Weaknesses 


In Test-Comp 2020, TracerX stood at 6th rank in overall. Inspecting the results, 
TracerX was one of the teams having the highest score in: cover-branches.BitVec- 
tors and cover-error.ControlFlow. Moreover, TracerX was one of the top 3 scor- 
ers in: cover-branches .DeviceDriversLinux64, cover-branches.ControlFlow, and 
cover-error.BitVectors. 


TracerX also accomplished more tasks by a meaningful margin compared to 
KLEE in: cover-branches.BusyBox and cover-branches.MainHeap. On the other 
hand, TracerX performed poorly in 3 sub-categories: cover-error.ReachSafety- 
ECA, ReachSafety-Sequentialized (both branches) and cover-error.Floats'. 


We should emphasize that TracerX in general requires symbolic execution 
trees to be bounded. Otherwise, interpolants cannot be computed. Moreover, 
TracerX is a heavy-weight approach and the overhead pays off as the problems 
gets harder. As a result it is expected for other light-weight approaches to have 
better results compared to TracerX in short timeout and memory limits. 


Moreover, it appears that the configuration we used to explore unbounded 
programs (max-depth=1000) and also in the benchezec tool-info (wrongly run- 
ning TracerX with the default memory (2GB) instead of 15GB RAM) might 
have had a profound effect in reaching timeout on the test programs. 


1 TracerX does not support symbolic expressions over floating point arithmetic. 
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3 Tool Setup and Configuration 


The TracerX version used in TEST-COMP 2020 is available at https://gitlab. 
com/sosy-lab/test-comp /archives-2020/blob/testcomp20/2020/tracerx.zip?. The 
configuration/setting and running of TracerX is similar to KLEE. TracerX has 
some extra command line arguments. Firstly, the argument “solver-backend=z3” 
should be provided to run TracerX with interpolation. Without this option Trac- 
erX will run similar to KLEE. TracerX can do exploration in both the Random 
and DFS modes. However, the DFS exploration mode (using “-search=dfs” ) is 
preferred since it naturally increases the chance of generating interpolants. Fur- 
thermore, the option “-subsumed-test” should be used to generate a test-case 
from the subsumed nodes. This option is required for the coverage competi- 
tion. The following is a sample full command line after compiling and running 
tracerx.py: 
“_,/tracerx-svcomp/bin/../tracerx_build/Release+Asserts/bin/klee -max- 
memory=14305 -output-dir=../tracerx-svcomp/bin/../test-suite -search=dfs 
-solver-backend=z3 -write-xml-tests -tc-orig=s3_clnt_3.BV.c.cil-2a.c -tc- 
hash=acd2272114f13977ea7bdc712c7567ec2e43dc8e07ef033eb67487bab7f£66d59 - 
-dump-states-on-halt=false -exit-on-error-type=Assert -max-depth=1000 
-max-time=900 /tmp/tmpvwkb459r/s3_clnt_3.BV.c.cil-2a.c.bc” 

The two command line options, “-max-memory” and “-max-time” are used 
to set the maximum memory and time budget. The options “-write-xml-tests” , 
“tc-orig”, and “-tc-hash” are to record the test input information. Once the 
halt instruction is invoked, “-dump-states-on-halt” creates a test case from all 
active states. The option “-exit-on-error-type—Assert” terminates the search as 
soon as a bug is found (used only for coverage categories). The command line 
option “-max-depth=1000” is used to bound the maximum number of branches 
explored in unbounded paths. 


4 Software Project and Contributors 


The information about TracerX with self-contained binary is publicly available at 
https: //www.comp.nus.edu.sg/~tracerx/. Also, the source code can be accessed 
at https: //github.com/tracer-x/klee repository. Authors of this paper and other 
colleagues have contributed and developed TracerX at National University of 
Singapore, Singapore. The authors of this paper acknowledge the direct and 
indirect support of their students, former researchers, and colleagues. 


? The benchezec tool-info file is https: / /github.com/sosy-lab/benchexec/blob/master/ 
benchexec/tools/tracerx.py and the benchmark description file is https://gitlab. 
com/sosy-lab/test-comp/bench-defs/blob/master /benchmark-defs/tracerx.xml. 

3 This was disabled to save execution time. However, it would have been better to 
enable this option for maximum coverage. 
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Abstract. LibKluzzer is a novel implementation of hybrid fuzzing, which 
combines the strengths of coverage-guided fuzzing and dynamic symbolic 
execution (a.k.a. whitebox fuzzing). While coverage-guided fuzzing can 
discover new execution paths at nearly native speed, whitebox fuzzing 
is capable of getting through complex branch conditions. In contrast 
to existing hybrid fuzzers, that operate directly on binaries, LibKluzzer 
leverages the LLVM compiler framework to work at the source code 
level. It employs LibFuzzer as the coverage-guided fuzzing component 
and KLUZZER, an extension of KLEE, as the whitebox fuzzing compo- 
nent. 


Keywords: Hybrid Fuzzing - Coverage-guided Fuzzing - Symbolic Ex- 
ecution - LLVM. 


1 Test Generation Approach 


LibKluzzer is based on hybrid fuzzing which tries to combine the strengths of 
coverage-guided fuzzing and whitebox fuzzing. Most existing advanced hybrid 
fuzzers, e.g. [6,7,8], employ coverage-guided fuzzing as the main search algorithm 
and only apply whitebox fuzzing selectively on the most promising inputs. While 
such advanced approach is also being under development and evaluation for 
LibKluzzer, for simplicity and given the short time frame available for adapting 
to Test-Comp, the participating version of LibKluzzer combines coverage-guided 
fuzzing and whitebox fuzzing in a very simple way. Without any intrinsic inte- 
gration, multiple instances of coverage-guided fuzzing and whitebox fuzzing are 
scripted to run in parallel in their own OS process. They operate on a common 
corpus to enable sharing the individual progresses. Each instance keeps an in- 
memory set of inputs it has generated, together with the code coverage achieved 
so far. Whenever an instance discovers an input that covers new code, it writes 
this input as a file to the common corpus. The corpus is scanned periodically 
by the instances to check for newly added files. Despite of (or thanks to) its 
simplity, LibKluzzer managed to perform very well in Test-Comp 2020. 
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2 Software Architecture 


Two major components of LibKluzzer are LibFuzzer [1] for coverage-guided 
fuzzing and KLUZZER [5] for whitebox fuzzing. As mentioned earlier, KLUZZER 
is an extension of KLEE [2]. While it uses most of the KLEE infrastructure in- 
cluding the underlying SMT solver STP [3], KLUZZER provides several signicant 
enhancements that make it more suitable for hybrid fuzzing (see [5] for more de- 
tails). For Test-Comp, both LibFuzzer and KLUZZER have been extended to 
support its specific requirements. The extension involves writing test cases in 
XML format, glue logic to convert the random byte array needed for the fuzzers 
into a sequence of calls to nondet functions, and implementing a fuzzing target 
as described later. 


Workflow First, the C program under test undergoes a set of source-to-source 
program transformations to enable in-process coverage-guided fuzzing. The trans- 
formed program is then compiled using Clang to create an LLVM bitcode file 
and an executable. The compilation involves, among others, code coverage in- 
strumentation and linking with LibFuzzer. Finally, the LLVM bitcode file is fed 
to KLUZZER to perform whitebox fuzzing, while the executable is started in 
two instances to perform coverage-guided fuzzing. These three fuzzing instances 
run concurrently until terminated by the Test-Comp BenchExec runner due to 
time limit exceeded. They share generated inputs via a common corpus of files 
as mentioned earlier and write XML test cases to the test suite on-the-fly. 


Transformations for in-process fuzzing While the main components of 
LibKluzzer are implemented in C++, the program transformations, that are 
required to enable in-process coverage-guided fuzzing, consist of a set of Bash 
and Python scripts. This form of fuzzing is much faster than traditional out- 
of-process fuzzing, which forks a new process for each execution of the main 
function, but requires the global state of the fuzzing target to remain largely un- 
changed or to be resetted between executions. The transformations esssentially 
perform the following steps for each benchmark: 


1. rename the existing main function to FuzzMe; 

2. identify and duplicate global variables; 

3. insert additional functions: FuzzerSaveCtx to capture the initial global state 
into the duplicated variables and FuzzerRestoreCtz to restore this state be- 
fore each new execution of the FuzzMe function; 

4. redirect calls to exit and abort to custom functions to prevent unwanted 
early exit from the fuzzing loop. 


The current script-based implementation of these transformations is very 
fragile and might not work out-of-the-box for non-Test-Comp benchmarks. The 
next version of LibKluzzer will replace these with proper Clang-based source-to- 
source transformations. 
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int nondet_int() { int LLVMFuzzerTestOneInput ( 
int Value = 0; uint8_t *Data, size_t Size) { 
if (Used + 4 <= Size) { FuzzerRestoreCtx() ; 
memcpy (&Value, Data + Used, 4); MakeGlobalCopy(Data, Size); 
Used += 4; Used = 0; 
} FuzzMe(); 
return Value; } 
} 


Fig. 1. Implementation of nondet functions and fuzzing target for Test-Comp 


Test-Comp fuzzing target and nondet functions Both KLUZZER and 
LibFuzzer require the definition of a fuzzing target, i.e. an implementation of 
the declared LLVMFuzzerTestOneInput function. The main function provided 
by the fuzzers will repeatedly call LLVMFuzzerTestOneInput with fuzz inputs 
in a loop to perform fuzzing. Each fuzz input consists of an array of random 
bytes and its size. Fig. 1 shows a conceptual implementation of LEVMFuzzerTe- 
stOneInput on the right hand side. First, the initial global execution state is 
restored. Then, the given fuzz input is copied into a global array and the number 
of bytes already consumed for fuzzing is set to zero; Finally, FuzzMe is invoked. 
During its execution, each time a nondet function is called to provide input, a 
corresponding number of bytes from the global byte array will be consumed to 
create the requested value, as exemplarily shown on the left hand side of Fig. 1 
for int. With this conversion from random bytes, no changes are needed in the 
core algorithms of KLUZZER and LibFuzzer for Test-Comp. 


3 Strengths and Weaknesses 


The main strength of LibKluzzer lies in achieving high code coverage as demon- 
strated by winning the branch coverage category of Test-Comp. Multiple factors 
contribute to this success including the extremely high throughput of in-process 
coverage-guided fuzzing implemented by LibFuzzer and the use of generational 
search in KLUZZER, a coverage-maximizing search heuristic for dynamic sym- 
bolic execution/whitebox fuzzing first proposed by SAGE [4]. The individual 
contribution of each single component is to be analyzed more thoroughly in a 
further detailed study. 

The main conceptual weakness of LibKluzzer is that the same coverage- 
maximizing search strategy is used for reaching error calls. It is a big surprise 
that LibKluzzer has still achieved the second place in the corresponding category. 
We expect that adapting the search heuristics of both LibFuzzer and KLUZZER 
to be directed by the distance to the location of error calls should improve the 
performance significantly. 

Especially, the big ECA benchmarks have proven to be problematic for both 
LibFuzzer and KLUZZER and hence also for LibKluzzer. The sequence of nondet 
values required to reach the error calls is very specific and nearly impossible to 
find with coverage-guided fuzzing, while KLUZZER suffers from path explosion. 
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In addition to error-directed search, path/state merging might be required to 
efficiently deal with these benchmarks. 

A further weakness is that LibKluzzer makes little effort on minimizing the 
test suite with respect to both the size of the test suite and the size of each 
test case. Too many redundant test cases might cause the validator to timeout. 
Furthermore, some produced test cases are too big hitting a corner case in the 
validator and forcing it to exceed the given memory limit. In these cases, the 
validator crashes prematurely, leaving the remaining test cases uncounted. 


4 Tool Setup and Configuration 


Installation The LibKluzzer archive submitted to Test-Comp 2020 (version 0.6) 
can be downloaded from https://gitlab.com/sosy-lab/test-comp/archives-2020/ 
blob/testcomp20/2020/libkluzzer.zip. After unpacking, the main executable script 
LibKluzzer can be found in the bin folder. 


Configuration The main script has been configured to reflect the resource 
restrictions of Test-Comp 2020. LibKluzzer treats every benchmark as 64-bit 
and always tries to maximize code coverage, and thus is agnostic to the property 
and architecture specification. The only meaningful parameter is the path to the 
source code file of the benchmark. 


Participation LibKluzzer participates in both available categories of Test- 
Comp 2020: Finding Bugs and Code Coverage. 


5 Software Project and Contributors 


LibKluzzer and KLUZZER are being developed by the author at University of 
Bremen, Germany. This research and development are supported by the Central 
Research Development Fund, University of Bremen, Germany within the project 
SYMVIR. The source code of LibKluzzer will be made available at https:// 
github.com/hoangmle/LibKluzzerTest Comp2020Submission. Much of the cred- 
its should go to the respective development teams of LibFuzzer and KLEE, which 
lay the foundation for LibKluzzer. 
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Abstract. Our CoVeriTrst submission, which is implemented in the 
analysis framework CPAcuecker, uses verification techniques for automatic 
test-case generation. To this end, it checks the reachability of every test 
goal and generates one test case per reachable goal. Instead of checking 
the reachability of every test goal individually, which is too expensive, 
CoVenriTesr considers all test goals at once and removes already covered 
goals from future reachability queries. To deal with the diverse set of 
Test-Comp tasks, CoVeriTrst uses a hybrid approach that interleaves 
value and predicate analysis. In contrast to Test-Comp’19, the time limit 
per iteration is no longer fixed for an analysis. Instead, we fix the iteration 
time limit and split it dynamically among the analyses, rewarding analyses 
that previously covered more test goals per time unit. 


Keywords: Test-case generation - Cooperative verification - CPAcHECKER 


1 Test-Generation Approach 


Test-case generation approaches have different strengths and weaknesses. To deal 
with the diverse Test-Comp benchmark, we therefore use an hybrid approach. 
More concrete, our Test-Comp’20 submission CoVERITEsT combines different 
verification approaches using the idea of cooperative, verifier-based testing [6]. 

Figure 1 shows the workflow of our CoVeERITEstT submission. Like in Test- 
Comp’19, CoVERITEsT iteratively combines a value analysis [5], which only 
tracks the explicit values of those variables stored in its precision, and a predicate 
analysis, which applies adjustable block encoding [4] and abstracts at loop heads 
only. Both analyses use counterexample-guided abstraction refinement [8] to 
adjust their precision (the set of tracked variables or the set of predicates) and 
check which open test goals can be reached. Whenever one analysis reaches a test 
goal, i.e., it finds a real counterexample, a test case adhering to the Test-Comp 
exchange format! is constructed from that counterexample [1] and the test goal 


* This work was funded by the Hessian LOEWE initiative within the Software-Factory 
4.0 project. 
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is removed from the set of open test goals. Depending on the Test-Comp’20 
property, the set of test goals is initialized to the set of all __VERIFIER_error () 
calls or the set of all branches. 

Like in Test-Comp’19, 


both analyses resume Test specification 

their exploration from the specification y J init specification ¢ 
revious round and do hose 

p Test goals 

not exchange any further à . ia aT 

3 i ge sa a 

information. The novelty we ~ Seay 


for Test-Comp’20 is the - 


? i Value Predicate 
dynamic adjustment of analysis [5]] “z=. i a analysis [4] 
. Stee: Opp =< - 568 
the analyses’ time limits. bat. D BECS 
To better adjust to the Time limit adaption 


program, we redistribute (20s, 80s) 

the iteration time limit Program 

among the analyses after 

each iteration round. Fig. 1: CoVeriTEst workflow for Test-Comp’20 


Initially, we grant the 

value analysis 20s and the predicate analysis 80s. Thereafter, we use the 
normalized progresses py and pp reported by the analyses to compute the new 
time limits. The normalized progress is the number of test goals covered by the 
analysis in the round divided by the total number of test goals. If no analysis 
made progress (py < 0 and pp < 0), we will reuse the time limits from the 
current round. Otherwise, we adjust the limits according to Eq. 1 (i € {V, P}). 
Each analysis gets at least 10s to avoid to turn it off. The remaining 80s of the 
iteration limit are redistributed according to the relative contribution of each 
analysis. The relative contribution of an analysis is its progress per time limit 
related to the sum of the progresses per time limit. 


Pi 
Py pp * 80s (1) 
limity limit p 


limit; = 10s 4 


The main differences to HybridTiger [11], which also applies cooperative, verifier- 
based testing, are that HybridTiger uses multi-goal partitioning [10] and that 
HybridTiger uses fixed time limits 120s and 720s for value and predicate analysis. 


2 Tool Architecture 


CoVERITEsT is implemented within the Java-based software-analysis framework 
CPACHECKER [3], which uses the Eclipse CDT parser? and integrates different 
SMT solvers via the JavaSMT [9] interface. For Test-Comp’20, we rely on 
CPACHECKER’s default SMT solver MathSAT5 [7]. 


? https: / /www.eclipse.org/cdt / 
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CPACHECKER’s core is the configurable program analysis framework [2], which 
defines the basis for the verification approaches. The framework consists of two 
parts: configurable program analyses (CPAs) and the CPA algorithm. CPAs like 
the value and predicate analysis used by CoVeErRITsst describe program analyses. 
Therefore, they define the abstract domain and the analysis operators. The CPA 
algorithm performs the reachability analysis for a given CPA and program. 

To integrate further verification techniques, the CPA framework is enhanced 
with algorithms like counterexample-guided abstraction refinement [8], the circular 
algorithm, which performs a continuous iteration over a set of analyses, or the 
test-case generation algorithm. To produce test cases, the test-case generation 
algorithm wraps and runs another analysis, generates test cases from counter- 
examples [1] returned by the wrapped analysis, updates the analysis specification 
(i.e., removes covered goals), and thereafter continues the wrapped analysis. 


3 Strengths and Weaknesses 


CoVerITEst won the third place in the category Cover-Branches and in contrast 
to Test-Comp’19, became better than KLEE in this category. 

The major change of CoVerITeEst from Test-Comp’19 to Test-Comp’20 
is the dynamic adjustment of the iteration time limits. Thus, many strength 
and weaknesses are still the same as in Test-Comp’19. CoVERITEst’s iterative 
combination of predicate and value analysis helped to adapt to the diverse set 
of Test-Comp tasks and its direct search of the test goals lead to few test cases. 
Also, CoVerITEsT has still problems with tasks that contain large arrays because 
these are not supported by the underlying analyses. Furthermore, CoVERITEST 
has problems with the new subcategory BusyBox-Memsafety and fails to parse 
the programs in the new subcategory SQLite-MemSafety. 

Now, let us discuss the effect of the adjustment of the time limits. For the 
time limit adjustment, we use the progress of the analyses measured in number 
of covered goals. Since there only exists one (reachable) test goal per task in 
the Cover-Error category, either both analyses make no progress in an iteration 
(py < 0 and pp < 0) or one analysis covered the goal and CoVERITEsT stops. 
Thus, the time limit adjustment has no effect on the Cover-Error category. 


Next, let us consider the Cover-Branches category. Our own comparison of 
the CoVERITEst submissions for Test-Comp’19 and Test-Comp’20 revealed that 
the time limit adjustment mainly affects tasks of the ECA subcategory. In total, 
the coverage value for 320 tasks decreased and the coverage value for 591 tasks 
increased. Moreover, the increase is typically significantly larger than the decrease 
(on average 6.3 percent points increase compared to 1.5 percent points decrease). 
Furthermore, most of the tasks with a difference in the coverage value belong to 
the ECA subcategory. Therefore, the time limit adjustment pays off. Nevertheless, 
CoVeriTEst could still perform better on the ECA subcategory. We believe that 
one problem in the ECA subcategory are redundant test goals, which lead to the 
same or similar test case generated multiple times and, thus, a waste of time. 
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4 Setup and Configuration 


CoVeErITeEst is distributed as part of CPACHECKER®, which requires a Java 8 
runtime environment. Our Test-Comp’20 submission, with which we participated 
in all categories, uses CPACHECKER in revision 32236. After the environmental 
setup, one can run CoVERITEST on program program.i with the following 
command. The file property. prp is a placeholder for the test specification, either 
coverage-error-call.prp or coverage-branches.prp. 


scripts/cpa.sh -testcomp20 -benchmark -heap 10000m 
-spec property.prp program.i 


The command above assumes that program.i runs in a 32-bit environment. 
When requiring a 64-bit environment, one needs to add the parameter -64 to the 
above command. Moreover, if the machine has not enough RAM to handle the 
specified Java heap memory, one can decrease the value passed with -heap. 

The test suite generated during the execution of CoVERITEsT is written to 
the directory test-suite, which is a subdirectory within the output directory 
of CPACHECKER. As defined by the Test-Comp rules, the test suite contains a 
metadata file and test-case files adhering to the required XML format. 


5 Project and Contributors 


CoVERITEsT is a component of the open-source project CPACHECKER ?, which is 
hosted by Dirk Beyer’s group at LMU Munich under Apache 2.0. Currently, also 
members of the Institute for System Programming of the Russian Academy of 
Sciences, Masaryk University, and Technical University of Darmstadt contribute 
to CPAcHECKER. We would like to thank all contributors. 
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Abstract. LEGION is a grey-box coverage-based concolic tool that aims 
to balance the complementary nature of fuzzing and symbolic execution 
to achieve the best of both worlds. It proposes a variation of Monte Carlo 
tree search (MCTS) that formulates program exploration as sequential 
decision-making under uncertainty guided by the best-first search strat- 
egy. It relies on approximate path-preserving fuzzing, a novel instance of 
constrained random testing, which quickly generates many diverse inputs 
that likely target program parts of interest. In Test-Comp 2020 [1], the 
prototype performed within 90% of the best score in 9 of 22 categories. 


Keywords: Symbolic Execution, Fuzzing, Monte Carlo Search 


1 Test-Generation Approach 


Coverage testing aims to traverse all execution paths of the program under test 
to verify its correctness. Two traditional techniques for this task, symbolic eze- 
cution [6] and fuzzing |7] are complementary in nature [5]. 

Consider exploring the program Ackermann@2 in Fig. 1 from the Test-Comp 
benchmarks as an example. Symbolic execution can compute inputs to penetrate 
the choke point (line 10) to reach the “rare branch” (lines 14/15), but then 
becomes unnecessarily expensive in solving the exponentially growing constraints 
from repeatedly unfolding the recursive function ackermann. By comparison, even 
though very few random fuzzer-generated inputs pass the choke point, the high 
speed of fuzzing means the “rare branch” will be quickly reached. 

The following research question arises when exploring the program space in 
a conditional branch: Will it be more efficient to focus on the space under the 
constraint, or to flood both branches with unconstrained inputs, to target the 
internals of log(m,n) in line 11 at the same time? 

LEGION? introduces MCTS-guided program exploration as a principled an- 
swer to this question, tailored to each program under test. For a program like 


* This research was supported by Data61 under the Defence Science and Technology 
Group’s Next Generation Technologies Program. 
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int ackermann(int m, int n) { 
if (m==0) return n+1; 
if (n==0) return ackermann(m-1,1); 
return ackermann(m-1,ackermann(m,n-1)); 


} 


— > Program entry state 


void main() { 


Program path 


BPeER REPRE 
NQOTRWNRFOUOANOQUARWNEH 


int m = input(), n = input(); 
// choke point selected for fuzzing 
if (m< 0 || m> 3) || (n <0 || n> 23) { 
Log(n,m) ; // common branch 
return; > Unknown paths 
} else { 
int r = ackermann(m,n); // rare branch > Observed paths 
assert(m < 2 || r >= 4); 
} D 
} Score: estimate the likelihood of finding new paths 
Fig. 1: Ackermann02.c Fig. 2: MCTS-guided fuzzing in LEGION 


Fig. 2, LEGION estimates the expectation of finding new paths by the UCT score 
(upper confidence bound for trees), a successful approach for games [3], aiming 
to balance exploration of program space (where success is still uncertain) against 
exploitation of partial results (that appear promising already). Code behind rare 
branches is targeted by approximate path-preserving fuzzing to efficiently gener- 
ate diverse inputs for a specific sub-part of the program. 

LEGION’s MCTS iteratively explores a tree-structured search space, whose 
nodes represent partial execution paths. On each iteration, LEGION selects a 
target node by recursively descending from the root along the highest scoring 
child, stopping when a parent’s score exceeds its childrens’. A node’s score is 
based on the ratio of the number of distinct vs. all paths observed passing through 
it, but nodes selected less often in the past are more likely to be chosen. Then, 
approximate path-preserving fuzzing is applied to explore the target node. The 
resulting execution traces are recorded and integrated into the tree. 

Approximate path-preserving fuzzing (APPF) quickly generates inputs that 
likely follow the target program path, and therefore is crucial for LEGION’s ef- 
ficiency. LEGION’s APPF implementation extends the QUICKSAMPLER [4] tech- 
nique, which is a recent mutation-based algorithm that expands a small set of 
constraint solutions to a larger suite of likely solutions. LEGION extends QUICK- 
SAMPLER from propositional logic to bitvector path constraints. 


2 Tool Description & Configuration 


We implemented LEGION as a prototype in Python3 on top of the symbolic 
execution engine angr [8]. We have extended its solver backend, claripy, by 
the approximate path-preserving fuzzing algorithm, relying on the optimizer 
component of Z3 [2]. Binaries are instrumented to record execution traces. 


Installation. Download and unpack the competition archive (commit b2fc8430): 

https://gitlab.com/sosy- lab/test-comp/archives -2020/blob/master/2020/legion. zip 
LEGION requires Python 3 with python-setuptools installed, and gcc-multilib 

for the compilation of C sources. Necessary libraries compiled for Ubuntu 18.04 
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are included in the subfolder lib (modified versions of angr, claripy and their 
dependencies). The archive contains the main executable, Legion.py, and a 
wrapper script, legion-sv that includes lib into PYTHONPATH. The version tag 
is 0.1-testcomp2020, options can be shown with python3 ./Legion.py --help. 
Configuration. In the competition, we ran ./legion-sv with these parameters: 
--Save-tests save test cases as xml files in Test-Comp format 
--persistent keep running when no more symbolic solutions are found 
(mitigates issue with dynamic memory allocations) 
--time-penalty 0 do not penalise a node for expensive constraint-solving 
(experimental feature, not yet evaluated) 
--random-seed 0 fix the random seed for deterministic result 
--symex-timeout 10 limit symbolic execution and constraint solving to 10s 
--conex-timeout 10 limit concrete binary execution to 10s 
In the category cover-branches, we additionally use this flag: 
--coverage-only don’t stop when finding an error 
Finally, -32 and -64 indicate whether to use 32 or 64 bits (this affects binary 
compilation and the sizes for nondeterministic values of types int, ...). 


Participation. LEGION participates in all categories of Test-Comp 2020. 

Software Project and Contributors. LEGION is principally developed by 
Dongge Liu, with technical and conceptual contributions by all authors of this 
paper. LEGION will be made available at https://github.com/Alan32Liu/Legion. 


3 Discussion 


LEGION is competitive in many categories of Test-Comp 2020, achieving within 90% 
of the best score in 2 of 9 error categories and 7 of 13 coverage categories. 


LEGION’s instrumentation and explo- 


void main( ) { g . 
int N=100000, al[N], a2[N], a3[N], i; ration algorithm can accurately model 
for (150; isN; i++) the program. Consider the benchmark 

al[i] = input(); a2[i] = input(); è š 

for(i=0; i<N; i++) a3[i] = al[i]; standard_copy2_ground-1.c in Fig. 3. 
farl iROr Nii te) ast] S Sei]; With a single symbolic execution through 
for(i=0; i<N; i++) assert(al[i] == a3[i]); R 

} the entire program over a trace found 


via initial random inputs, LEGION under- 
Fig. 3: standard_copy2_ground-1.c stands that all guards of the for loops can 
only evaluate in one way, and so omits them from the selection phase. It does 
discover that the assertion inside the last loop contributes interesting decisions, 
however, and will come up with two different ways to evaluate the comparison 
al[i] == a3[i], one of which triggers the error. With such an accurate model in 
combination with its principled MCTS search strategy, LEGION is particularly 
good at covering corner cases in deep loops: All other tools failed to score full 
marks in standard_copy*_ground-*.c benchmarks, but LEGION succeeded in 9 
out of 18. We can furthermore solve benchmarks where pure constraint solving 
fails, e.g., when the solver times out on hard constraints of complex paths we 
label the respective branches for pure random exploration. 
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While instrumentation provides accurate information on the program, its 
currently naive implementation significantly slows down the concrete execution 
of programs with long execution traces. We mitigate this weakness by setting a 
time limit on the concrete executions. As a consequence, inputs that correspond 
to long concrete execution are not saved. In the future, we plan to explore Intel’s 
PIN tool, which offloads binary tracing into the CPU with negligible overhead. 

LEGION inherits some limitations from angr as a symbolic execution back- 
end. Some benchmarks, such as array-tiling/mbpr5.c, dynamically allocate 
memory with a symbolic size that depends on the input. angr eagerly con- 
cretises this value, producing unsatisfiable path constraints for a feasible ex- 
ecution path. LEGION detects this inconsistency as soon as it encounters the 
feasible path and omits the erroneous node from selection. This helps e.g. on 
bubblesort-alloca-1.c where LEGION achieved full coverage (in contrast to most 
other participants) despite the dynamic allocations. 

LEGION performed poorly on benchmark sets bitvector and ssh-simplified. 
These programs have long sequences of equality constraint that are hard to 
satisfy with fuzzing. This happens to be an extreme example of the parent- 
child trade-off that LEGION intends to balance where fuzzing the parent gives 
nearly no reward. This could potentially be mitigated by decreasing LEGION’s 
exploration ratio in the UCT score, but we have not attempted such fine-tuning. 

Another problem is allocations when loop counters or array sizes are ran- 
domly chosen very large in 64 bit mode, leading to excessively long concrete 
execution traces that cause timeouts or memory exhaustion. We plan to period- 
ically prune the in-memory representation of the tree in the future. 
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